VhgIdentityScatterPlot {Virusparies}R Documentation

VhgIdentityScatterPlot: Scatter plot for refseq identity vs -log10 of refseq E-value

Description

VhgIdentityScatterPlot generates a scatter plot of viral refSeq identity vs. -log10 of viral refseq E-value. It colors the points based on phylum and adds a horizontal line representing the cutoff value.

Usage

VhgIdentityScatterPlot(
  file,
  groupby = "best_query",
  taxa_rank = "Family",
  cutoff = 1e-05,
  conlen_bubble_plot = FALSE,
  contiglen_breaks = 5,
  theme_choice = "linedraw",
  cut_colour = "#990000",
  title = "Scatterplot of viral reference E-values and sequence identity",
  title_size = 16,
  title_face = "bold",
  title_colour = "#2a475e",
  subtitle = NULL,
  subtitle_size = 12,
  subtitle_face = "bold",
  subtitle_colour = "#1b2838",
  xlabel = "Viral reference sequence identity (%)",
  ylabel = "-log10 of viral reference E-values",
  axis_title_size = 12,
  xtext_size = 10,
  x_angle = NULL,
  ytext_size = 10,
  y_angle = NULL,
  legend_title = "Group",
  legend_position = "bottom",
  legend_title_size = 12,
  legend_title_face = "bold",
  legend_text_size = 10,
  highlight_groups = NULL,
  group_unwanted_phyla = NULL
)

Arguments

file

VirusHunterGatherer hittable.

groupby

(optional): A character specifying the column containing the groups (default: "best_query"). Note: Gatherer hittables do not have a "best_query" column. Please provide an appropriate column for grouping.

taxa_rank

(optional): When groupby is set to "ViralRefSeq_taxonomy", specify the taxonomic rank to group your data by. Supported ranks are:

  • "Subphylum"

  • "Class"

  • "Subclass"

  • "Order"

  • "Suborder"

  • "Family" (default)

  • "Subfamily"

  • "Genus" (including Subgenus)

cutoff

(optional): A numeric value representing the cutoff for the refseq E-value (default: 1e-5).

conlen_bubble_plot

(optional): Logical value indicating whether the contig_len column should be used to size the bubbles in the plot. Applicable only to VirusGatherer hittables input (default: FALSE).

contiglen_breaks

(optional): Number of breaks (default: 5) for the bubble plot (for conlen_bubble_plot=TRUE).

theme_choice

(optional): A character indicating the ggplot2 theme to apply. Options include "minimal", "classic", "light", "dark", "void", "grey" (or "gray"), "bw", "linedraw" (default), and "test". Append "_dotted" to any theme to add custom dotted grid lines (e.g., "classic_dotted").

cut_colour

(optional): The color for the horizontal cutoff line (default: "#990000").

title

(optional): The title of the plot (default: "Scatterplot of viral reference E-values and sequence identity").

title_size

(optional): The size of the title text (default: 16).

title_face

(optional): The face (bold, italic, etc.) of the title text (default: "bold").

title_colour

(optional): The color of the title text (default: "#2a475e").

subtitle

(optional): The subtitle of the plot (default: NULL).

subtitle_size

(optional): The size of the subtitle text (default: 12).

subtitle_face

(optional): The face (bold, italic, etc.) of the subtitle text (default: "bold").

subtitle_colour

(optional): The color of the subtitle text (default: "#1b2838").

xlabel

(optional): The label for the x-axis (default: "Viral reference sequence identity (%)").

ylabel

(optional): The label for the y-axis (default: "-log10 of viral reference E-values").

axis_title_size

(optional): The size of the axis titles (default: 12).

xtext_size

(optional): The size of the x-axis text (default: 10).

x_angle

(optional): An integer specifying the angle (in degrees) for the x-axis text labels. Default is NULL, meaning no change.

ytext_size

(optional): The size of the y-axis text (default: 10).

y_angle

(optional): An integer specifying the angle (in degrees) for the y-axis text labels. Default is NULL, meaning no change.

legend_title

(optional): The title of the legend (default: "Group").

legend_position

(optional): The position of the legend (default: "bottom).

legend_title_size

(optional): The size of the legend title text (default: 12).

legend_title_face

(optional): The face (bold, italic, etc.) of the legend title text (default: "bold").

legend_text_size

(optional): The size of the legend text (default: 10).

highlight_groups

(optional): A character vector specifying the names of viral groups to be highlighted in the plot (Default:NULL).

group_unwanted_phyla

(optional): A character string specifying which group of viral phyla to retain in the analysis. Valid values are:

"rna"

Retain only the phyla specified for RNA viruses.

"smalldna"

Retain only the phyla specified for small DNA viruses.

"largedna"

Retain only the phyla specified for large DNA viruses.

"others"

Retain only the phyla that match small DNA, Large DNA and RNA viruses.

All other phyla not in the specified group will be grouped into a single category: "Non-RNA-virus" for "rna", "Non-Small-DNA-Virus" for "smalldna","Non-Large-DNA-Virus" for "largedna",or "Other Viruses" for "others".

Details

VhgIdentityScatterPlot generates a scatter plot for refseq sequence identity vs -log10 of refseq E-value. It accepts both VirusHunter and VirusGatherer hittables as input. The plot includes:

Tibble data frames containing summary statistics (median, Q1, Q3, mean, sd, min, and max) for 'ViralRefSeq_E' and 'ViralRefSeq_ident' values are generated. Optionally, summary statistics for 'contig_len' values are also included if applicable. These summary statistics, along with the plot object, are returned within a list object.

highlight_groups enables the user to specify one or more viral groups from the column indicated in the groupby argument. These groups will be highlighted in the plot.

Warning: In some cases, E-values might be exactly 0. When these values are transformed using -log10, R returns "inf" as the output. To avoid this issue, we replace all E-values that are 0 with the smallest e-value that is greater than 0. If the smallest E-value is above the user-defined cutoff, we use a value of cutoff * 10^-10 to replace the zeros.

Value

A list containing the following components:

Author(s)

Sergej Ruff

See Also

VirusHunterGatherer is available here: https://github.com/lauberlab/VirusHunterGatherer.

Examples

path <- system.file("extdata", "virushunter.tsv", package = "Virusparies")
file <- ImportVirusTable(path)


# Basic plot
plot <- VhgIdentityScatterPlot(file,cutoff = 1e-5)

plot(plot$plot)

# Custom plot with additional arguments
custom_plot <- VhgIdentityScatterPlot(file,
                                     cutoff = 1e-5,
                                     theme_choice = "dark",
                                     cut_colour = "blue",
                                     title = "Custom Scatter Plot",
                                     title_size = 18,
                                     title_face = "italic",
                                     title_colour = "darkred",
                                     subtitle = "Custom Subtitle",
                                     subtitle_size = 14,
                                     subtitle_face = "italic",
                                     subtitle_colour = "purple",
                                     xlabel = "Custom X Label",
                                     ylabel = "Custom Y Label",
                                     axis_title_size = 14,
                                     xtext_size = 12,
                                     ytext_size = 12,
                                     legend_title = "Custom Legend",
                                     legend_position = "top",
                                     legend_title_size = 14,
                                     legend_title_face = "italic",
                                     legend_text_size = 12)

plot(custom_plot$plot)

# import gatherer files
path2 <- system.file("extdata", "virusgatherer.tsv", package = "Virusparies")
vg_file <- ImportVirusTable(path2)

# vgplot: virusgatherer plot with ViralRefSeq_taxonomy as custom grouping
vgplot <- VhgIdentityScatterPlot(vg_file,groupby = "ViralRefSeq_taxonomy")
vgplot$plot

# plot as bubble plot with contig length as size
vgplot_con <- VhgIdentityScatterPlot(vg_file,groupby = "ViralRefSeq_taxonomy",
conlen_bubble_plot = TRUE,contiglen_breaks = 4,legend_position = "right")

vgplot_con


[Package Virusparies version 1.1.0 Index]