VhgPreprocessTaxa {Virusparies} | R Documentation |
VhgPreprocessTaxa: preprocess ViralRefSeq_taxonomy elements
VhgPreprocessTaxa(file, taxa_rank, num_cores = 1)
file |
A data frame containing VirusHunter or VirusGatherer hittable results. |
taxa_rank |
(optional): Specify the taxonomic rank to group your data by. Supported ranks are:
|
num_cores |
(optional): Number of cores used (default: 1) |
Process the ViralRefSeq_taxonomy
column.
Besides best_query
, the user can utilize the ViralRefSeq_taxonomy
column as x_column
or groupby
in plots.
This column needs preprocessing because it is too long and has too many unique elements for effective grouping.
The element containing the taxa suffix specified by the taxa_rank
argument is used. NA values are replaced by "unclassified".
This function is used internally by every function that can use the ViralRefSeq_taxonomy
column as input.
The user can also apply this function independently to process the taxonomy column and filter for the selected taxa rank in their data.
For datasets with significantly more than 100,000 observations, it is recommended to use this function to ensure it is skipped in the plot functions.
The num_cores
parameter allows you to divide the dataset into multiple parts, corresponding to the number of cores available.
This enables parallel processing across multiple threads, thereby speeding up the overall processing time.
file with preprocessed ViralRefSeq_taxonomy elements
Sergej Ruff
VirusHunterGatherer is available here: https://github.com/lauberlab/VirusHunterGatherer.
path <- system.file("extdata", "virushunter.tsv", package = "Virusparies")
file <- ImportVirusTable(path)
file_filtered <- VhgPreprocessTaxa(file,"Family")
cat("ViralRefSeq_taxonomy before processing:\n")
print(head(file$ViralRefSeq_taxonomy,5))
cat("ViralRefSeq_taxonomy after processing:\n")
print(head(file_filtered$ViralRefSeq_taxonomy,5))