VhgIdenFacetedScatterPlot {Virusparies} | R Documentation |
VhgIdenFacetedScatterPlot generates a scatter plot of viral refseq identity versus -log10 of refseq E-value
for each virus group in the best_query
or ViralRefSeq_taxonomy
column . The points are colored based on whether the
E-value meets a specified cutoff and are faceted by the viral groups in the best_query
or ViralRefSeq_taxonomy
column.
VhgIdenFacetedScatterPlot(
file,
groupby = "best_query",
taxa_rank = "Family",
cutoff = 1e-05,
conlen_bubble_plot = FALSE,
contiglen_breaks = 5,
theme_choice = "linedraw",
title = "Faceted scatterplot of viral reference E-values and sequence identity",
title_size = 16,
title_face = "bold",
title_colour = "#2a475e",
subtitle = NULL,
subtitle_size = 12,
subtitle_face = "bold",
subtitle_colour = "#1b2838",
xlabel = "Viral reference sequence identity (%)",
ylabel = "-log10 of viral reference E-values",
axis_title_size = 12,
xtext_size = 10,
x_angle = NULL,
ytext_size = 10,
y_angle = NULL,
legend_position = "bottom",
legend_title_size = 12,
legend_title_face = "bold",
legend_text_size = 10,
true_colour = "blue",
false_colour = "red",
wrap_ncol = 2,
filter_group_criteria = NULL
)
file |
VirusHunterGatherer hittable. |
groupby |
(optional): A character specifying the column containing the groups (default: "best_query"). Note: Gatherer hittables do not have a "best_query" column. Please provide an appropriate column for grouping. |
taxa_rank |
(optional): When
|
cutoff |
(optional): A numeric value representing the cutoff for the refseq E-value. Points with |
conlen_bubble_plot |
(optional): Logical value indicating whether the |
contiglen_breaks |
(optional): Number of breaks (default: 5) for the bubble plot (for |
theme_choice |
(optional): A character indicating the ggplot2 theme to apply. Options include "minimal", "classic", "light", "dark", "void", "grey" (or "gray"), "bw", "linedraw" (default), and "test". Append "_dotted" to any theme to add custom dotted grid lines (e.g., "classic_dotted"). |
title |
(optional): The title of the plot (default: "Faceted scatter plot of viral reference E-values and sequence identity"). |
title_size |
(optional): The size of the title text (default: 16). |
title_face |
(optional): The face (bold, italic, etc.) of the title text (default: "bold"). |
title_colour |
(optional): The color of the title text (default: "#2a475e"). |
subtitle |
(optional): The subtitle of the plot (default: NULL). |
subtitle_size |
(optional): The size of the subtitle text (default: 12). |
subtitle_face |
(optional): The face (bold, italic, etc.) of the subtitle text (default: "bold"). |
subtitle_colour |
(optional): The color of the subtitle text (default: "#1b2838"). |
xlabel |
(optional): The label for the x-axis (default: "Viral reference sequence identity (%)"). |
ylabel |
(optional): The label for the y-axis (default: "-log10 of viral reference E-values"). |
axis_title_size |
(optional): The size of the axis titles (default: 12). |
xtext_size |
(optional): The size of the x-axis text (default: 10). |
x_angle |
(optional): An integer specifying the angle (in degrees) for the x-axis text labels. Default is NULL, meaning no change. |
ytext_size |
(optional): The size of the y-axis text (default: 10). |
y_angle |
(optional): An integer specifying the angle (in degrees) for the y-axis text labels. Default is NULL, meaning no change. |
legend_position |
(optional): The position of the legend (default: "bottom). |
legend_title_size |
(optional): The size of the legend title text (default: 12). |
legend_title_face |
(optional): The face (bold, italic, etc.) of the legend title text (default: "bold"). |
legend_text_size |
(optional): The size of the legend text (default: 10). |
true_colour |
(optional): The color for points that meet the cutoff condition (default: "blue"). |
false_colour |
(optional): The color for points that do not meet the cutoff condition (default: "red"). |
wrap_ncol |
(optional): The number of columns for faceting (default: 12). |
filter_group_criteria |
(optional): Character vector, numeric vector, or single character/numeric value.
|
'VhgIdenFacetedScatterPlot' takes a VirusHunter or VirusGatherer hittable and a cutoff value as inputs. The plot includes:
Points colored based on whether they meet the cutoff condition.
Faceting by the best_query
column as the default column. The user can provide their own column
for grouping.
The option conlen_bubble_plot
= TRUE generates a bubble plot where the size of points corresponds to "contig_len" (exclusive to VirusGatherer).
filter_group_criteria
: Allows filtering of viral groups by specifying either a single character
string or a vector of character strings that match unique entries in groupby
.
Alternatively, a single numeric value, a range, or a vector of numeric values can be used to filter groups.
For example, if groupby
is "best_query" with the following unique groups:
Anello_ORF1core
Gemini_Rep
Genomo_Rep
Hepadna-Nackedna_TP
Setting filter_group_criteria
to c("Anello_ORF1core", "Genomo_Rep")
will filter the data to only
include observations where the "best_query" column has 'Anello_ORF1core' or 'Genomo_Rep'.
Alternatively, setting filter_group_criteria
to 2:3
will return only the second and third
alphabetically ordered viral groups from "best_query".
The order also matches the order of the viral groups in the faceted scatter plot.
This is particularly useful when there are too many viral groups to be plotted in a single plot, allowing for separation into different groups. It also enables the user to focus on specific groups of interest for more detailed analysis.
Tibble data frames containing summary statistics (median, Q1, Q3, mean, sd, min, and max) for 'ViralRefSeq_E' and 'ViralRefSeq_ident' values are generated. Optionally, summary statistics for 'contig_len' values are also included if applicable. These summary statistics, along with the plot object, are returned within a list object.
Warning: In some cases, E-values might be exactly 0. When these values are transformed using -log10, R
returns "inf" as the output. To avoid this issue, we replace all E-values that are 0 with the smallest E-value that is greater than 0.
If the smallest E-value is above the user-defined cutoff, we use a value of cutoff * 10^-10
to replace the zeros.
A list containing the following components:
Plot: A plot object representing the faceted scatter plot.
evalue_stats: A tibble data frame with summary statistics for "ViralRefSeq_E" values.
identity_stats: A tibble data frame with summary statistics for "ViralRefSeq_ident" values.
contig_stats (optional): A tibble data frame with summary statistics for "contig_len" values, included only if VirusGatherer is used with conlen_bubble_plot=TRUE
.
Sergej Ruff
VirusHunterGatherer is available here: https://github.com/lauberlab/VirusHunterGatherer.
path <- system.file("extdata", "virushunter.tsv", package = "Virusparies")
file <- ImportVirusTable(path)
# plot 1
plot <- VhgIdenFacetedScatterPlot(file,cutoff = 1e-5)
plot
# plot 2 with custom data
custom_plot <- VhgIdenFacetedScatterPlot(file,
cutoff = 1e-4,
theme_choice = "dark",
title = "Custom Scatterplot",
title_size = 18,
title_face = "italic",
title_colour = "orange",
xlabel = "Custom X Label",
ylabel = "Custom Y Label",
axis_title_size = 14,
legend_position = "right",
true_colour = "green",
false_colour = "purple")
custom_plot
# import gatherer files
path2 <- system.file("extdata", "virusgatherer.tsv", package = "Virusparies")
vg_file <- ImportVirusTable(path2)
# vgplot: virusgatherer plot with ViralRefSeq_taxonomy as custom grouping
vgplot <- VhgIdenFacetedScatterPlot(vg_file,groupby = "ViralRefSeq_taxonomy")
vgplot