VgConLenViolin {Virusparies}R Documentation

VgConLenViolin: Generate a Violinplot of contig length for each group (Gatherer only)

Description

VgConLenViolin creates a violin plot to visualize the distribution of contig lengths for each group in VirusGatherer hittables.

Usage

VgConLenViolin(
  vg_file = vg_file,
  taxa_rank = "Family",
  cut = 1e-05,
  log10_scale = TRUE,
  reorder_criteria = "median",
  adjust_bw = 1,
  jitter_point = FALSE,
  theme_choice = "linedraw",
  flip_coords = TRUE,
  title = "Violinplot of contig length for each group",
  title_size = 16,
  title_face = "bold",
  title_colour = "#2a475e",
  subtitle = NULL,
  subtitle_size = 12,
  subtitle_face = "bold",
  subtitle_colour = "#1b2838",
  xlabel = "Viral group",
  ylabel = "Contig length (nt)",
  axis_title_size = 12,
  xtext_size = 10,
  x_angle = NULL,
  ytext_size = 10,
  y_angle = NULL,
  remove_group_labels = FALSE,
  legend_title = "Phylum",
  legend_position = "bottom",
  legend_title_size = 12,
  legend_title_face = "bold",
  legend_text_size = 10,
  min_observations = 1,
  facet_ncol = NULL,
  add_boxplot = FALSE,
  group_unwanted_phyla = NULL
)

Arguments

vg_file

A data frame containing VirusGatherer hittable results.

taxa_rank

(optional): Specify the taxonomic rank to group your data by. Supported ranks are:

  • "Subphylum"

  • "Class"

  • "Subclass"

  • "Order"

  • "Suborder"

  • "Family" (default)

  • "Subfamily"

  • "Genus" (including Subgenus)

cut

(optional): A numeric value representing the cutoff for the refseq E-value (default: 1e-5).

log10_scale

(optinal): transform y-axis to log10 scale (default: TRUE).

reorder_criteria

Character string specifying the criteria for reordering the x-axis ('max', 'min', 'median'(Default),'mean','phylum'). NULL sorts alphabetically. You can also specify criteria with 'phylum_' prefix (e.g., 'phylum_median') to sort by phylum first and then by the specified statistic within each phylum.

adjust_bw

(optional): control the bandwidth of the kernel density estimator used to create the violin plot. A higher value results in a smoother plot by increasing the bandwidth, while a lower value can make the plot more detailed but potentially noisier (default: 1).

jitter_point

(optional): logical: TRUE to show all observations, FALSE to show only groups with less than 2 observations (default: FALSE).

theme_choice

(optional): A character indicating the ggplot2 theme to apply. Options include "minimal", "classic", "light", "dark", "void", "grey" (or "gray"), "bw", "linedraw" (default), and "test". Append "_dotted" to any theme to add custom dotted grid lines (e.g., "classic_dotted").

flip_coords

(optional): Logical indicating whether to flip the coordinates of the plot (default: TRUE).

title

(optional): The title of the plot (default: "Violinplot of contig length for each group").

title_size

(optional): The size of the title text (default: 16).

title_face

(optional): The face (bold, italic, etc.) of the title text (default: "bold").

title_colour

(optional): The color of the title text (default: "#2a475e").

subtitle

(optional): A character specifying the subtitle of the plot (default: NULL).

subtitle_size

(optional): Numeric specifying the size of the subtitle text(default: 12).

subtitle_face

(optional): A character specifying the font face for the subtitle text (default: "bold").

subtitle_colour

(optional): A character specifying the color for the subtitle text (default: "#1b2838").

xlabel

(optional): The label for the x-axis (default: "Viral group").

ylabel

(optional): The label for the y-axis (default: "Contig length (nt)").

axis_title_size

(optional): The size of the axis titles (default: 12).

xtext_size

(optional): The size of the x-axis text (default: 10).

x_angle

(optional): An integer specifying the angle (in degrees) for the x-axis text labels. Default is NULL, meaning no change.

ytext_size

(optional): The size of the y-axis text (default: 10).

y_angle

(optional): An integer specifying the angle (in degrees) for the y-axis text labels. Default is NULL, meaning no change.

remove_group_labels

(optional): If TRUE, the group labels will be removed; if FALSE or omitted, the labels will be displayed.

legend_title

(optional): A character specifying the title for the legend (default: "Phylum").

legend_position

(optional): The position of the legend (default: "bottom).

legend_title_size

(optional): Numeric specifying the size of the legend title text (default: 12).

legend_title_face

(optional): A character specifying the font face for the legend title text (default: "bold").

legend_text_size

(optional): Numeric specifying the size of the legend text (default: 10).

min_observations

(optional): Minimum number of observations required per group to be included in the plot (default: 1).

facet_ncol

(optional): The number of columns for faceting (default: NULL). It is recommended to specify this when the number of viral groups is high, to ensure they fit well in one plot.

add_boxplot

(optional): Add a boxplot to the violin plot (default: FALSE).

group_unwanted_phyla

(optional): A character string specifying which group of viral phyla to retain in the analysis. Valid values are:

"rna"

Retain only the phyla specified for RNA viruses.

"smalldna"

Retain only the phyla specified for small DNA viruses.

"largedna"

Retain only the phyla specified for large DNA viruses.

"others"

Retain only the phyla that match small DNA, Large DNA and RNA viruses.

All other phyla not in the specified group will be grouped into a single category: "Non-RNA-virus" for "rna", "Non-Small-DNA-Virus" for "smalldna","Non-Large-DNA-Virus" for "largedna",or "Other Viruses" for "others".

Details

VgConLenViolin creates a violin plot to visualize the distribution of contig lengths for each group in the "ViralRefSeq_taxonomy" column of the VirusGatherer hittable. The x-axis represents the groups as defined by the "ViralRefSeq_taxonomy" column, and the y-axis represents the contig lengths.

By default, the y-axis is transformed to a log10 scale to better visualize differences in contig lengths across groups. This transformation can be disabled by setting the log10_scale argument to FALSE.

min_observations filters the data sets to include only groups with at least the specified number of observations before plotting them. This feature allows users to exclude groups with insufficient data. By default, every group is plotted, as the minimum requirement is set to at least one observation per group.

Value

A list containing the following components:

Author(s)

Sergej Ruff

See Also

VirusHunterGatherer is available here: https://github.com/lauberlab/VirusHunterGatherer.

Examples


# import gatherer files
path2 <- system.file("extdata", "virusgatherer.tsv", package = "Virusparies")
vg_file <- ImportVirusTable(path2)

# create a violinplot.
violinplot <- VgConLenViolin(vg_file=vg_file,cut = 1e-5,log10_scale = TRUE)

violinplot



[Package Virusparies version 1.1.0 Index]