VhgRunsBarplot {Virusparies}R Documentation

VhgRunsBarplot: Generate a bar plot showing the number of data sets with unique runs found for each Virus group.

Description

VhgRunsBarplot: Generate a bar plot showing the number of data sets with unique runs found for each Virus group.

Usage

VhgRunsBarplot(
  file,
  groupby = "best_query",
  taxa_rank = "Family",
  cut = 1e-05,
  reorder_criteria = "max",
  theme_choice = "linedraw",
  flip_coords = TRUE,
  title = "Distribution of viral groups detected across query sequences",
  title_size = 16,
  title_face = "bold",
  title_colour = "#2a475e",
  subtitle = "default",
  subtitle_size = 12,
  subtitle_face = "bold",
  subtitle_colour = "#1b2838",
  xlabel = "Viral group",
  ylabel = "Number of datasets with hits for group",
  axis_title_size = 12,
  xtext_size = 10,
  x_angle = NULL,
  ytext_size = 10,
  y_angle = NULL,
  remove_group_labels = FALSE,
  legend_title = "Phylum",
  legend_position = "bottom",
  legend_title_size = 12,
  legend_title_face = "bold",
  legend_text_size = 10,
  plot_text = 3,
  plot_text_size = 3.5,
  plot_text_position_dodge = 0.9,
  plot_text_hjust = -0.1,
  plot_text_vjust = 0.5,
  plot_text_colour = "black",
  facet_ncol = NULL,
  group_unwanted_phyla = NULL
)

Arguments

file

A data frame containing VirusHunter or VirusGatherer hittable results.

groupby

(optional): A character specifying the column containing the groups (default: "best_query"). Note: Gatherer hittables do not have a "best_query" column. Please provide an appropriate column for grouping.

taxa_rank

(optional): When groupby is set to "ViralRefSeq_taxonomy", specify the taxonomic rank to group your data by. Supported ranks are:

  • "Subphylum"

  • "Class"

  • "Subclass"

  • "Order"

  • "Suborder"

  • "Family" (default)

  • "Subfamily"

  • "Genus" (including Subgenus)

cut

(optional): A numeric value representing the cutoff for the refseq E-value (default: 1e-5). Removes rows in file with values larger than cutoff value in "ViralRefSeq_E" column.

reorder_criteria

(optional): Character string specifying the criteria for reordering the x-axis ('max' (default), 'min','phylum',phylum_max,phylum_min). NULL sorts alphabetically.

theme_choice

(optional): A character indicating the ggplot2 theme to apply. Options include "minimal", "classic", "light", "dark", "void", "grey" (or "gray"), "bw", "linedraw" (default), and "test". Append "_dotted" to any theme to add custom dotted grid lines (e.g., "classic_dotted").

flip_coords

(optional): Logical indicating whether to flip the coordinates of the plot (default: TRUE).

title

(optional): The title of the plot (default: "Distribution of viral groups detected across query sequences").

title_size

(optional): The size of the title text (default: 16).

title_face

(optional): The face (bold, italic, etc.) of the title text (default: "bold").

title_colour

(optional): The color of the title text (default: "#2a475e").

subtitle

(optional): A character specifying the subtitle of the plot. Default is "default", which calculates the total number of data sets with hits and returns it as "total number of data sets with hits: " followed by the calculated number. an empty string ("") removes the subtitle.

subtitle_size

(optional): The size of the subtitle text (default: 12).

subtitle_face

(optional): The face (bold, italic, etc.) of the subtitle text (default: "bold").

subtitle_colour

(optional): The color of the subtitle text (default: "#1b2838").

xlabel

(optional): The label for the x-axis (default: "Viral group").

ylabel

(optional): The label for the y-axis (default: "Number of data sets with hits for group").

axis_title_size

(optional): The size of the axis titles (default: 12).

xtext_size

(optional): The size of the x-axis text (default: 10).

x_angle

(optional): An integer specifying the angle (in degrees) for the x-axis text labels. Default is NULL, meaning no change.

ytext_size

(optional): The size of the y-axis text (default: 10).

y_angle

(optional): An integer specifying the angle (in degrees) for the y-axis text labels. Default is NULL, meaning no change.

remove_group_labels

(optional): If TRUE, the group labels will be removed; if FALSE or omitted, the labels will be displayed.

legend_title

(optional): A character specifying the title for the legend (default: "Phylum").

legend_position

(optional): The position of the legend (default: "bottom).

legend_title_size

(optional): Numeric specifying the size of the legend title text (default: 12).

legend_title_face

(optional): A character specifying the font face for the legend title text (default: "bold").

legend_text_size

(optional): Numeric specifying the size of the legend text (default: 10).

plot_text

(optional): An index (0-3) to select the variable for text labels.

  • 0 = None.

  • 1 = Number of viral groups detected across query sequences.

  • 2 = Only the percentage.

  • 3 = Both (Default).

plot_text_size

(optional): The size of the text labels added to the plot (default: 3.5).

plot_text_position_dodge

(optional): The degree of dodging for positioning text labels (default: 0.9).

plot_text_hjust

(optional): The horizontal justification of text labels (default: -0.1).

plot_text_vjust

(optional): The vertical justification of text labels (default: 0.5). It is recommended to change vjust when setting flip_coords = FALSE.

plot_text_colour

(optional): The color of the text labels added to the plot (default: "black").

facet_ncol

(optional): The number of columns for faceting (default: NULL). It is recommended to specify this when the number of viral groups is high, to ensure they fit well in one plot.

group_unwanted_phyla

(optional): A character string specifying which group of viral phyla to retain in the analysis. Valid values are:

"rna"

Retain only the phyla specified for RNA viruses.

"smalldna"

Retain only the phyla specified for small DNA viruses.

"largedna"

Retain only the phyla specified for large DNA viruses.

"others"

Retain only the phyla that match small DNA, Large DNA and RNA viruses.

All other phyla not in the specified group will be grouped into a single category: "Non-RNA-virus" for "rna", "Non-Small-DNA-Virus" for "smalldna","Non-Large-DNA-Virus" for "largedna",or "Other Viruses" for "others".

Details

VhgRunsBarplot generates a bar plot showing the number of data sets with unique runs found for each Virus group. It takes VirusHunter and VirusGatherer hittables as Input.

Only significant values below the threshold specified by the 'cut' argument (default: 1e-5) are included in the plot.

Value

A list containing the bar plot and tabular data with information from the plot.

Author(s)

Sergej Ruff

See Also

VirusHunterGatherer is available here: https://github.com/lauberlab/VirusHunterGatherer.

Examples

# import data
path <- system.file("extdata", "virushunter.tsv", package = "Virusparies")
file <- ImportVirusTable(path)

# plot 1: plot boxplot for "identity"
plot <- VhgRunsBarplot(file,cut = 1e-5)
plot



[Package Virusparies version 1.1.0 Index]