VhgSumHitsBarplot {Virusparies}R Documentation

VhgSumHitsBarplot: Generate a bar plot showing the sum of reads/contigs for each virus family

Description

VhgSumHitsBarplot preprocesses virus data for plotting and generates a bar plot showing the sum of reads/contigs for each virus family from the input data set.

Usage

VhgSumHitsBarplot(
  file,
  groupby = "best_query",
  taxa_rank = "Family",
  y_column = "num_hits",
  cut = 1e-05,
  reorder_criteria = "max",
  theme_choice = "linedraw",
  flip_coords = TRUE,
  title = "Distribution of hits for each virus group",
  title_size = 16,
  title_face = "bold",
  title_colour = "#2a475e",
  subtitle = "default",
  subtitle_size = 12,
  subtitle_face = "bold",
  subtitle_colour = "#1b2838",
  xlabel = "Viral group",
  ylabel = "Total number of hits",
  axis_title_size = 12,
  xtext_size = 10,
  x_angle = NULL,
  ytext_size = 10,
  y_angle = NULL,
  remove_group_labels = FALSE,
  legend_title = "Phylum",
  legend_position = "bottom",
  legend_title_size = 12,
  legend_title_face = "bold",
  legend_text_size = 10,
  plot_text = 3,
  plot_text_size = 3.5,
  plot_text_position_dodge = 0.9,
  plot_text_hjust = -0.1,
  plot_text_vjust = 0.5,
  plot_text_colour = "black",
  facet_ncol = NULL,
  group_unwanted_phyla = NULL
)

Arguments

file

A data frame containing VirusHunters hittable results.

groupby

(optional): A character specifying the column containing the groups (default: "best_query").

taxa_rank

(optional): When groupby is set to "ViralRefSeq_taxonomy", specify the taxonomic rank to group your data by. Supported ranks are:

  • "Subphylum"

  • "Class"

  • "Subclass"

  • "Order"

  • "Suborder"

  • "Family" (default)

  • "Subfamily"

  • "Genus" (including Subgenus)

y_column

A character specifying the column containing the values to be compared. Currently "ViralRefSeq_contigs" (micro-contigs),"contigs", and "num_hits" (reads) are supported columns (default:"num_hits").

cut

(optional): A numeric value representing the cutoff for the refseq E-value (default: 1e-5). Removes rows in file with values larger than cutoff value in "ViralRefSeq_E" column.

reorder_criteria

(optional): Character string specifying the criteria for reordering the x-axis ('max' (default), 'min','phylum',phylum_max,phylum_min). NULL sorts alphabetically.

theme_choice

(optional): A character indicating the ggplot2 theme to apply. Options include "minimal", "classic", "light", "dark", "void", "grey" (or "gray"), "bw", "linedraw" (default), and "test". Append "_dotted" to any theme to add custom dotted grid lines (e.g., "classic_dotted").

flip_coords

(optional): Logical indicating whether to flip the coordinates of the plot (default: TRUE).

title

(optional): The title of the plot (default: "Distribution of hits for each virus group").

title_size

(optional): The size of the title text (default: 16).

title_face

(optional): The face (bold, italic, etc.) of the title text (default: "bold").

title_colour

(optional): The color of the title text (default: "#2a475e").

subtitle

(optional): A character specifying the subtitle of the plot. Default is "total number of hits/micro-contigs: " followed by the calculated number.empty string ("") removes subtitle.

subtitle_size

(optional): Numeric specifying the size of the subtitle text(default: 12).

subtitle_face

(optional): A character specifying the font face for the subtitle text (default: "bold").

subtitle_colour

(optional): A character specifying the color for the subtitle text (default: "#1b2838"). .

xlabel

(optional): The label for the x-axis (default: "Viral group").

ylabel

(optional): The label for the y-axis (default: "Total number of hits").

axis_title_size

(optional): The size of the axis titles (default: 12).

xtext_size

(optional): The size of the x-axis text (default: 10).

x_angle

(optional): An integer specifying the angle (in degrees) for the x-axis text labels. Default is NULL, meaning no change.

ytext_size

(optional): The size of the y-axis text (default: 10).

y_angle

(optional): An integer specifying the angle (in degrees) for the y-axis text labels. Default is NULL, meaning no change.

remove_group_labels

(optional): If TRUE, the group labels will be removed; if FALSE or omitted, the labels will be displayed.

legend_title

(optional): A character specifying the title for the legend (default: "Phylum").

legend_position

(optional): A character specifying the position of the legend (default: "bottom").

legend_title_size

(optional): Numeric specifying the size of the legend title text (default: 12).

legend_title_face

(optional): A character specifying the font face for the legend title text (default: "bold").

legend_text_size

(optional): Numeric specifying the size of the legend text (default: 10).

plot_text

(optional): An index (0-3) to select the variable for text labels.

  • 0 = None.

  • 1 = Number of hits for each viral group.

  • 2 = Only the percentage.

  • 3 = Both (Default).

plot_text_size

(optional): The size of the text labels added to the plot (default: 3.5).

plot_text_position_dodge

(optional): The degree of dodging for positioning text labels (default: 0.9).

plot_text_hjust

(optional): The horizontal justification of text labels (default: -0.1).

plot_text_vjust

(optional): The vertical justification of text labels (default: 0.5). It is recommended to change vjust when setting flip_coords = FALSE.

plot_text_colour

(optional): The color of the text labels added to the plot (default: "black").

facet_ncol

(optional): The number of columns for faceting (default: NULL). It is recommended to specify this when the number of viral groups is high, to ensure they fit well in one plot.

group_unwanted_phyla

(optional): A character string specifying which group of viral phyla to retain in the analysis. Valid values are:

"rna"

Retain only the phyla specified for RNA viruses.

"smalldna"

Retain only the phyla specified for small DNA viruses.

"largedna"

Retain only the phyla specified for large DNA viruses.

"others"

Retain only the phyla that match small DNA, Large DNA and RNA viruses.

All other phyla not in the specified group will be grouped into a single category: "Non-RNA-virus" for "rna", "Non-Small-DNA-Virus" for "smalldna","Non-Large-DNA-Virus" for "largedna",or "Other Viruses" for "others".

Details

VhgSumHitsBarplot preprocesses virus data for plotting by calculating the sum of hits for each virus family from the input data set (accepts only VirusHunter hittables). It then generates a bar plot showing the sum of hits for each virus family. Additionally, it returns the processed data for further analysis.

Value

A list containing the generated bar plot and processed data.

Author(s)

Sergej Ruff

See Also

VirusHunterGatherer is available here: https://github.com/lauberlab/VirusHunterGatherer.

Examples

path <- system.file("extdata", "virushunter.tsv", package = "Virusparies")
file <- ImportVirusTable(path)

# plot 1: plot boxplot for reads
plot <- VhgSumHitsBarplot(file,cut = 1e-5)
plot

# plot 2: plot boxplot for micro_reads
plot_reads <- VhgSumHitsBarplot(file,cut = 1e-5,
y_column = "ViralRefSeq_contigs")
plot_reads

# import gatherer files
path2 <- system.file("extdata", "virusgatherer.tsv", package = "Virusparies")
vg_file <- ImportVirusTable(path2)


# plot 3: contigs in Gatherer
contig_plot <- VhgSumHitsBarplot(vg_file,groupby = "ViralRefSeq_taxonomy",
y_column = "contig")
contig_plot


[Package Virusparies version 1.0.0 Index]