SummarizeViralStats {Virusparies}R Documentation

SummarizeViralStats: Generate summary stats outside of plot functions

Description

Summarizes data by grouping it according to a specified metric (contig length, E-value or Identity). SummarizeViralStats generates a summary table that includes counts of observations based on a specified metric cutoff. It computes relevant summary statistics depending on the selected metric, with options to filter rows based on a cutoff value.

Usage

SummarizeViralStats(
  file,
  groupby = "best_query",
  metric,
  metric_cutoff,
  filter_cutoff = NULL,
  show_total = FALSE,
  extra_stats = NULL,
  sort_by = NULL,
  top_n = NULL,
  group_unwanted_phyla = NULL
)

Arguments

file

VirusHunterGatherer hittable.

groupby

(optional): A character specifying the column containing the groups (default: "best_query"). Note: Gatherer hittables do not have a "best_query" column. Please provide an appropriate column for grouping.

metric

A character string specifying the name of the metric column to be used for calculations. This column must be present in file. Supported metric columns include:

  • "contig_len"

  • "ViralRefSeq_E"

  • "ViralRefSeq_ident"

metric_cutoff

A numeric value used to classify the metric into two categories: below cutoff and above or equal to cutoff.

filter_cutoff

A numeric value for optional filtering of the data based on E-value. Rows where the specified filtering column has a value less than this cutoff are retained. If NULL, no filtering is applied. Default is NULL.

show_total

A logical value indicating whether to include a row with the total sums for each column in the summary table. Default is FALSE.

extra_stats

A character vector specifying additional summary statistics to include in the output. Options include:

  • "mean"

  • "median"

  • "Q1"

  • "Q3"

  • "sd"

  • "min"

  • "max"

If NULL (the default), only the basic counts are included.

sort_by

(optional): A character string specifying the column name by which to sort the results. Supported values include:

  • "less_than_X": The count of observations below the specified metric_cutoff (X is replaced by the cutoff value).

  • "equal_or_more_than_X": The count of observations greater than or equal to the specified metric_cutoff (X is replaced by the cutoff value).

  • "total": The total count of observations in each group.

  • "mean": The mean value of the specified metric in each group (if "mean" is included in extra_stats).

  • "median": The median value of the specified metric in each group (if "median" is included in extra_stats).

  • "Q1": The first quartile (25th percentile) of the specified metric in each group (if "Q1" is included in extra_stats).

  • "Q3": The third quartile (75th percentile) of the specified metric in each group (if "Q3" is included in extra_stats).

  • "sd": The standard deviation of the specified metric in each group (if "sd" is included in extra_stats).

  • "min": The minimum value of the specified metric in each group (if "min" is included in extra_stats).

  • "max": The maximum value of the specified metric in each group (if "max" is included in extra_stats).

If NULL (the default), no sorting is applied.

top_n

(optional): A numeric value indicating the number of top rows to return based on the selected metric. If NULL (the default), all rows are returned.

group_unwanted_phyla

(optional): A character string specifying which group of viral phyla to retain in the analysis. Valid values are:

"rna"

Retain only the phyla specified for RNA viruses.

"smalldna"

Retain only the phyla specified for small DNA viruses.

"largedna"

Retain only the phyla specified for large DNA viruses.

"others"

Retain only the phyla that match small DNA, Large DNA and RNA viruses.

All other phyla not in the specified group will be grouped into a single category: "Non-RNA-virus" for "rna", "Non-Small-DNA-Virus" for "smalldna","Non-Large-DNA-Virus" for "largedna",or "Other Viruses" for "others".

Value

A data frame summarizing the viral stats. The output includes:

Author(s)

Sergej Ruff

See Also

VirusHunterGatherer is available here: https://github.com/lauberlab/VirusHunterGatherer.

Examples

path <- system.file("extdata", "virushunter.tsv", package = "Virusparies")
file <- ImportVirusTable(path)

stats <- SummarizeViralStats(file=file,
groupby = "best_query",
metric = "ViralRefSeq_ident",
metric_cutoff = 90,
show_total = TRUE,
filter_cutoff = 1e-5,
extra_stats = c("median","Q1","Q3"))

print(stats)




[Package Virusparies version 1.1.0 Index]