plotVar {diversityForest} | R Documentation |
x
on a categorical variable y
This function allows to visualise the (estimated) distributions of a variable x
for each of the categories of a categorical variable y
.
This allows to study the dependency structure of y
on x
.
Two types of visualisations are available: density plots and boxplots.
plotVar(
x,
y,
plot_type = c("both", "density", "boxplot")[1],
x_label = "",
y_label = "",
plot_title = ""
)
x |
Metric variable or ordered categorical variable that has at least as many unique values as |
y |
Factor variable with at least three categories. |
plot_type |
Plot type, one of the following: "both" (the default), "density", "boxplot". If "density", a |
x_label |
Optional. The label of the x-axis. |
y_label |
Optional. The label (heading) of the legend that differentiates the categories of |
plot_title |
Optional. The title of the plot. |
See the 'Details' section of plotMcl
.
A ggplot2 plot.
Roman Hornung
Hornung, R., Hapfelmeier, A. (2024). Multi forests: Variable importance for multi-class outcomes. arXiv:2409.08925, <doi:10.48550/arXiv.2409.08925>.
Hornung, R. (2022). Diversity forests: Using split sampling to enable innovative complex split procedures in random forests. SN Computer Science 3(2):1, <doi:10.1007/s42979-021-00920-1>.
## Not run:
## Load package:
library("diversityForest")
## Load the "ctg" data set:
data(ctg)
## Set seed to make results reproducible (this is necessary because
## the rug plot produced by 'plotVar' does not show all observations, but
## only a random subset of 1000 observations):
set.seed(1234)
## Using a "density" plot and a "boxplot", visualise the (estimated)
## distributions of the variable "Mean" for each of the categories of the
# variable "Tendency":
plotVar(x = ctg$Mean, y = ctg$Tendency)
## Re-create this plot with labels:
plotVar(x = ctg$Mean, y = ctg$Tendency, x_label = "Mean of the histogram ('Mean')",
y_label = "Histogram tendency ('Tendency')",
plot_title = "Relationship between 'Mean' and 'Tendency'")
## Re-create this plot, but only show the "density" plot:
plotVar(x = ctg$Mean, y = ctg$Tendency, plot_type = "density",
x_label = "Mean of the histogram ('Mean')",
y_label = "Histogram tendency ('Tendency')",
plot_title = "Relationship between 'Mean' and 'Tendency'")
## Use ggplot2 and RColorBrewer functionalities to change the line colors and
## the labels of the categories of "Tendency":
library("ggplot2")
library("RColorBrewer")
p <- plotVar(x = ctg$Mean, y = ctg$Tendency, plot_type = "density",
x_label = "Mean of the histogram ('Mean')",
y_label = "Histogram tendency ('Tendency')",
plot_title = "Relationship between 'Mean' and 'Tendency'") +
scale_color_manual(values = brewer.pal(n = 3, name = "Set2"),
labels = c("left asymmetric", "symmetric",
"right asymmetric")) +
scale_linetype_manual(values = rep(1, 3),
labels = c("left asymmetric", "symmetric",
"right asymmetric"))
p
## # Save as PDF:
## ggsave(file="mypathtofolder/FigureXY1.pdf", width=10, height=7)
## End(Not run)