VhgPreprocessTaxa {Virusparies}R Documentation

VhgPreprocessTaxa: preprocess ViralRefSeq_taxonomy elements

Description

VhgPreprocessTaxa: preprocess ViralRefSeq_taxonomy elements

Usage

VhgPreprocessTaxa(file, taxa_rank, num_cores = 1)

Arguments

file

A data frame containing VirusHunter or VirusGatherer hittable results.

taxa_rank

(optional): Specify the taxonomic rank to group your data by. Supported ranks are:

  • "Subphylum"

  • "Class"

  • "Subclass"

  • "Order"

  • "Suborder"

  • "Family" (default)

  • "Subfamily"

  • "Genus" (including Subgenus)

num_cores

(optional): Number of cores used (default: 1)

Details

Process the ViralRefSeq_taxonomy column.

Besides best_query, the user can utilize the ViralRefSeq_taxonomy column as x_column or groupby in plots. This column needs preprocessing because it is too long and has too many unique elements for effective grouping. The element containing the taxa suffix specified by the taxa_rank argument is used. NA values are replaced by "unclassified".

This function is used internally by every function that can use the ViralRefSeq_taxonomy column as input. The user can also apply this function independently to process the taxonomy column and filter for the selected taxa rank in their data.

For datasets with significantly more than 100,000 observations, it is recommended to use this function to ensure it is skipped in the plot functions.

The num_cores parameter allows you to divide the dataset into multiple parts, corresponding to the number of cores available. This enables parallel processing across multiple threads, thereby speeding up the overall processing time.

Value

file with preprocessed ViralRefSeq_taxonomy elements

Author(s)

Sergej Ruff

See Also

VirusHunterGatherer is available here: https://github.com/lauberlab/VirusHunterGatherer.

Examples

path <- system.file("extdata", "virushunter.tsv", package = "Virusparies")
file <- ImportVirusTable(path)

file_filtered <- VhgPreprocessTaxa(file,"Family")

cat("ViralRefSeq_taxonomy before processing:\n")
print(head(file$ViralRefSeq_taxonomy,5))

cat("ViralRefSeq_taxonomy after processing:\n")
print(head(file_filtered$ViralRefSeq_taxonomy,5))




[Package Virusparies version 1.1.0 Index]