VhgIdentityScatterPlot {Virusparies} | R Documentation |
VhgIdentityScatterPlot: Scatter plot for refseq identity vs -log10 of refseq E-value
Description
VhgIdentityScatterPlot generates a scatter plot of viral refSeq identity vs. -log10 of viral refseq E-value. It colors the points based on phylum and adds a horizontal line representing the cutoff value.
Usage
VhgIdentityScatterPlot(
file,
groupby = "best_query",
taxa_rank = "Family",
cutoff = 1e-05,
conlen_bubble_plot = FALSE,
contiglen_breaks = 5,
theme_choice = "linedraw",
cut_colour = "#990000",
title = "Scatterplot of viral reference E-values and sequence identity",
title_size = 16,
title_face = "bold",
title_colour = "#2a475e",
subtitle = NULL,
subtitle_size = 12,
subtitle_face = "bold",
subtitle_colour = "#1b2838",
xlabel = "Viral reference sequence identity (%)",
ylabel = "-log10 of viral reference E-values",
axis_title_size = 12,
xtext_size = 10,
x_angle = NULL,
ytext_size = 10,
y_angle = NULL,
legend_title = "Group",
legend_position = "bottom",
legend_title_size = 12,
legend_title_face = "bold",
legend_text_size = 10,
highlight_groups = NULL,
group_unwanted_phyla = NULL
)
Arguments
file |
VirusHunterGatherer hittable. |
groupby |
(optional): A character specifying the column containing the groups (default: "best_query"). Note: Gatherer hittables do not have a "best_query" column. Please provide an appropriate column for grouping. |
taxa_rank |
(optional): When
|
cutoff |
(optional): A numeric value representing the cutoff for the refseq E-value (default: 1e-5). |
conlen_bubble_plot |
(optional): Logical value indicating whether the |
contiglen_breaks |
(optional): Number of breaks (default: 5) for the bubble plot (for |
theme_choice |
(optional): A character indicating the ggplot2 theme to apply. Options include "minimal", "classic", "light", "dark", "void", "grey" (or "gray"), "bw", "linedraw" (default), and "test". Append "_dotted" to any theme to add custom dotted grid lines (e.g., "classic_dotted"). |
cut_colour |
(optional): The color for the horizontal cutoff line (default: "#990000"). |
title |
(optional): The title of the plot (default: "Scatterplot of viral reference E-values and sequence identity"). |
title_size |
(optional): The size of the title text (default: 16). |
title_face |
(optional): The face (bold, italic, etc.) of the title text (default: "bold"). |
title_colour |
(optional): The color of the title text (default: "#2a475e"). |
subtitle |
(optional): The subtitle of the plot (default: NULL). |
subtitle_size |
(optional): The size of the subtitle text (default: 12). |
subtitle_face |
(optional): The face (bold, italic, etc.) of the subtitle text (default: "bold"). |
subtitle_colour |
(optional): The color of the subtitle text (default: "#1b2838"). |
xlabel |
(optional): The label for the x-axis (default: "Viral reference sequence identity (%)"). |
ylabel |
(optional): The label for the y-axis (default: "-log10 of viral reference E-values"). |
axis_title_size |
(optional): The size of the axis titles (default: 12). |
xtext_size |
(optional): The size of the x-axis text (default: 10). |
x_angle |
(optional): An integer specifying the angle (in degrees) for the x-axis text labels. Default is NULL, meaning no change. |
ytext_size |
(optional): The size of the y-axis text (default: 10). |
y_angle |
(optional): An integer specifying the angle (in degrees) for the y-axis text labels. Default is NULL, meaning no change. |
legend_title |
(optional): The title of the legend (default: "Group"). |
legend_position |
(optional): The position of the legend (default: "bottom). |
legend_title_size |
(optional): The size of the legend title text (default: 12). |
legend_title_face |
(optional): The face (bold, italic, etc.) of the legend title text (default: "bold"). |
legend_text_size |
(optional): The size of the legend text (default: 10). |
highlight_groups |
(optional): A character vector specifying the names of viral groups to be highlighted in the plot (Default:NULL). |
group_unwanted_phyla |
(optional): A character string specifying which group of viral phyla to retain in the analysis. Valid values are:
All other phyla not in the specified group will be grouped into a single category:
"Non-RNA-virus" for |
Details
VhgIdentityScatterPlot generates a scatter plot for refseq sequence identity vs -log10 of refseq E-value. It accepts both VirusHunter and VirusGatherer hittables as input. The plot includes:
A line indicates whether the observed values are above or below the cutoff specified by the 'cutoff' argument (default: 1e-5).
The option
conlen_bubble_plot
= TRUE generates a bubble plot where the size of points corresponds to "contig_len" (exclusive to VirusGatherer).
Tibble data frames containing summary statistics (median, Q1, Q3, mean, sd, min, and max) for 'ViralRefSeq_E' and 'ViralRefSeq_ident' values are generated. Optionally, summary statistics for 'contig_len' values are also included if applicable. These summary statistics, along with the plot object, are returned within a list object.
highlight_groups
enables the user to specify one or more viral groups from the column indicated in the groupby
argument. These groups will be highlighted in the plot.
Warning: In some cases, E-values might be exactly 0. When these values are transformed using -log10, R
returns "inf" as the output. To avoid this issue, we replace all E-values that are 0 with the smallest e-value that is greater than 0.
If the smallest E-value is above the user-defined cutoff, we use a value of cutoff * 10^-10
to replace the zeros.
Value
A list containing the following components:
Plot: A plot object representing the faceted scatterplot.
evalue_stats: A tibble data frame with summary statistics for "ViralRefSeq_E" values.
identity_stats: A tibble data frame with summary statistics for "ViralRefSeq_ident" values.
contig_stats (optional): A tibble data frame with summary statistics for "contig_len" values, included only if VirusGatherer is used with
conlen_bubble_plot=TRUE
.
Author(s)
Sergej Ruff
See Also
VirusHunterGatherer is available here: https://github.com/lauberlab/VirusHunterGatherer.
Examples
path <- system.file("extdata", "virushunter.tsv", package = "Virusparies")
file <- ImportVirusTable(path)
# Basic plot
plot <- VhgIdentityScatterPlot(file,cutoff = 1e-5)
plot(plot$plot)
# Custom plot with additional arguments
custom_plot <- VhgIdentityScatterPlot(file,
cutoff = 1e-5,
theme_choice = "dark",
cut_colour = "blue",
title = "Custom Scatter Plot",
title_size = 18,
title_face = "italic",
title_colour = "darkred",
subtitle = "Custom Subtitle",
subtitle_size = 14,
subtitle_face = "italic",
subtitle_colour = "purple",
xlabel = "Custom X Label",
ylabel = "Custom Y Label",
axis_title_size = 14,
xtext_size = 12,
ytext_size = 12,
legend_title = "Custom Legend",
legend_position = "top",
legend_title_size = 14,
legend_title_face = "italic",
legend_text_size = 12)
plot(custom_plot$plot)
# import gatherer files
path2 <- system.file("extdata", "virusgatherer.tsv", package = "Virusparies")
vg_file <- ImportVirusTable(path2)
# vgplot: virusgatherer plot with ViralRefSeq_taxonomy as custom grouping
vgplot <- VhgIdentityScatterPlot(vg_file,groupby = "ViralRefSeq_taxonomy")
vgplot$plot
# plot as bubble plot with contig length as size
vgplot_con <- VhgIdentityScatterPlot(vg_file,groupby = "ViralRefSeq_taxonomy",
conlen_bubble_plot = TRUE,contiglen_breaks = 4,legend_position = "right")
vgplot_con