VhgGetSubject {Virusparies} | R Documentation |
VhgGetSubject: Process and Count Viral Subjects within Groups
VhgGetSubject(
file,
groupby = "best_query",
remove_identifiers = TRUE,
include_run_ids = FALSE,
extract_brackets = FALSE,
group_unwanted_phyla = NULL
)
file |
A data frame containing VirusHunter or VirusGatherer hittable results. |
groupby |
(optional): A character specifying the column containing the groups (default: "best_query"). Note: Gatherer hittables do not have a "best_query" column. Please provide an appropriate column for grouping. |
remove_identifiers |
(optional): if |
include_run_ids |
(optional): If |
extract_brackets |
(optional): extract content within square brackets []. |
group_unwanted_phyla |
(optional): A character string specifying which group of viral phyla to retain in the analysis. Valid values are:
All other phyla not in the specified group will be grouped into a single category:
"Non-RNA-virus" for |
The function VhgGetSubject
counts the number of viral subjects in the ViralRefSeq_subject
column
for each group specified by the groupby
argument.
It returns a tibble with three columns: the first column contains the viral group specified by the groupby
argument,
the second column lists the viral subjects found in that group, and the third column shows how many times each viral subject appears in that group.
a processed tibble object.
Sergej Ruff
VirusHunterGatherer is available here: https://github.com/lauberlab/VirusHunterGatherer.
# import data
path <- system.file("extdata", "virushunter.tsv", package = "Virusparies")
file <- ImportVirusTable(path)
# process column and filter for significant groups
file <- VhgPreprocessTaxa(file,taxa_rank = "Family")
file_filtered <- VhgSubsetHittable(file,ViralRefSeq_E_criteria = 1e-5)
subject_df <- VhgGetSubject(file_filtered,groupby = "ViralRefSeq_taxonomy",
remove_identifiers = TRUE)
print(subject_df)