VhgGetSubject {Virusparies}R Documentation

VhgGetSubject: Process and Count Viral Subjects within Groups

Description

VhgGetSubject: Process and Count Viral Subjects within Groups

Usage

VhgGetSubject(
  file,
  groupby = "best_query",
  remove_identifiers = TRUE,
  include_run_ids = FALSE,
  extract_brackets = FALSE,
  group_unwanted_phyla = NULL
)

Arguments

file

A data frame containing VirusHunter or VirusGatherer hittable results.

groupby

(optional): A character specifying the column containing the groups (default: "best_query"). Note: Gatherer hittables do not have a "best_query" column. Please provide an appropriate column for grouping.

remove_identifiers

(optional): if TRUE (default), removes the identifiers in the ViralRefSeq_subject cells.

include_run_ids

(optional): If TRUE (default is TRUE), adds a fourth column named run_ids to the output. This column contains a comma-separated list of unique identifiers from either the SRA_run or run_id column, aggregated for each combination of group and subject.

extract_brackets

(optional): extract content within square brackets [].

group_unwanted_phyla

(optional): A character string specifying which group of viral phyla to retain in the analysis. Valid values are:

"rna"

Retain only the phyla specified for RNA viruses.

"smalldna"

Retain only the phyla specified for small DNA viruses.

"largedna"

Retain only the phyla specified for large DNA viruses.

"others"

Retain only the phyla that match small DNA, Large DNA and RNA viruses.

All other phyla not in the specified group will be grouped into a single category: "Non-RNA-virus" for "rna", "Non-Small-DNA-Virus" for "smalldna","Non-Large-DNA-Virus" for "largedna",or "Other Viruses" for "others".

Details

The function VhgGetSubject counts the number of viral subjects in the ViralRefSeq_subject column for each group specified by the groupby argument. It returns a tibble with three columns: the first column contains the viral group specified by the groupby argument, the second column lists the viral subjects found in that group, and the third column shows how many times each viral subject appears in that group.

Value

a processed tibble object.

Author(s)

Sergej Ruff

See Also

VirusHunterGatherer is available here: https://github.com/lauberlab/VirusHunterGatherer.

Examples


# import data
path <- system.file("extdata", "virushunter.tsv", package = "Virusparies")
file <- ImportVirusTable(path)

# process column and filter for significant groups
file <- VhgPreprocessTaxa(file,taxa_rank = "Family")
file_filtered <- VhgSubsetHittable(file,ViralRefSeq_E_criteria = 1e-5)

subject_df <- VhgGetSubject(file_filtered,groupby = "ViralRefSeq_taxonomy",
remove_identifiers = TRUE)

print(subject_df)


[Package Virusparies version 1.1.0 Index]