SummarizeViralStats {Virusparies} | R Documentation |
SummarizeViralStats: Generate summary stats outside of plot functions
Description
Summarizes data by grouping it according to a specified metric (contig length, E-value or Identity). SummarizeViralStats generates a summary table that includes counts of observations based on a specified metric cutoff. It computes relevant summary statistics depending on the selected metric, with options to filter rows based on a cutoff value.
Usage
SummarizeViralStats(
file,
groupby = "best_query",
metric,
metric_cutoff,
filter_cutoff = NULL,
show_total = FALSE,
extra_stats = NULL,
sort_by = NULL,
top_n = NULL,
group_unwanted_phyla = NULL
)
Arguments
file |
VirusHunterGatherer hittable. |
groupby |
(optional): A character specifying the column containing the groups (default: "best_query"). Note: Gatherer hittables do not have a "best_query" column. Please provide an appropriate column for grouping. |
metric |
A character string specifying the name of the metric column to be used for calculations.
This column must be present in
|
metric_cutoff |
A numeric value used to classify the metric into two categories: below cutoff and above or equal to cutoff. |
filter_cutoff |
A numeric value for optional filtering of the data based on E-value. Rows where the specified filtering column has a value
less than this cutoff are retained. If |
show_total |
A logical value indicating whether to include a row with the total sums for each column in the summary table.
Default is |
extra_stats |
A character vector specifying additional summary statistics to include in the output. Options include:
If |
sort_by |
(optional): A character string specifying the column name by which to sort the results. Supported values include:
If |
top_n |
(optional): A numeric value indicating the number of top rows to return based on the selected metric.
If |
group_unwanted_phyla |
(optional): A character string specifying which group of viral phyla to retain in the analysis. Valid values are:
All other phyla not in the specified group will be grouped into a single category:
"Non-RNA-virus" for |
Value
A data frame summarizing the viral stats. The output includes:
The count of observations below and above or equal to the
metric_cutoff
.Optional additional summary statistics as specified by
extra_stats
.An optional total row if
show_total
isTRUE
.
Author(s)
Sergej Ruff
See Also
VirusHunterGatherer is available here: https://github.com/lauberlab/VirusHunterGatherer.
Examples
path <- system.file("extdata", "virushunter.tsv", package = "Virusparies")
file <- ImportVirusTable(path)
stats <- SummarizeViralStats(file=file,
groupby = "best_query",
metric = "ViralRefSeq_ident",
metric_cutoff = 90,
show_total = TRUE,
filter_cutoff = 1e-5,
extra_stats = c("median","Q1","Q3"))
print(stats)