clugen {clugenr} | R Documentation |
This is the main function of clugenr, and possibly the only function most users will need.
clugen(
num_dims,
num_clusters,
num_points,
direction,
angle_disp,
cluster_sep,
llength,
llength_disp,
lateral_disp,
allow_empty = FALSE,
cluster_offset = NA,
proj_dist_fn = "norm",
point_dist_fn = "n-1",
clusizes_fn = clusizes,
clucenters_fn = clucenters,
llengths_fn = llengths,
angle_deltas_fn = angle_deltas,
seed = NA
)
num_dims |
Number of dimensions. |
num_clusters |
Number of clusters to generate. |
num_points |
Total number of points to generate. |
direction |
Average direction of the cluster-supporting lines. Can be
a vector of length |
angle_disp |
Angle dispersion of cluster-supporting lines (radians). |
cluster_sep |
Average cluster separation in each dimension (vector of
length |
llength |
Average length of cluster-supporting lines. |
llength_disp |
Length dispersion of cluster-supporting lines. |
lateral_disp |
Cluster lateral dispersion, i.e., dispersion of points from their projection on the cluster-supporting line. |
allow_empty |
Allow empty clusters? |
cluster_offset |
Offset to add to all cluster centers (vector of length
|
proj_dist_fn |
Distribution of point projections along cluster-supporting lines, with three possible values:
|
point_dist_fn |
Controls how the final points are created from their projections on the cluster-supporting lines, with three possible values:
|
clusizes_fn |
Distribution of cluster sizes. By default, cluster sizes
are determined by the clusizes function, which uses the normal distribution
(\(\mu=\) |
clucenters_fn |
Distribution of cluster centers. By default, cluster
centers are determined by the clucenters function, which uses the uniform
distribution, and takes into account the |
llengths_fn |
Distribution of line lengths. By default, the lengths of
cluster-supporting lines are determined by the llengths function, which
uses the folded normal distribution (\(\mu=\) |
angle_deltas_fn |
Distribution of line angle differences with respect to
|
seed |
An integer used to initialize the PRNG, allowing for reproducible
results. If specified, |
If a custom function was given in the clusizes_fn
parameter, it is
possible that num_points
may have a different value than what was
specified in the num_points
parameter.
The terms "average" and "dispersion" refer to measures of central tendency and statistical dispersion, respectively. Their exact meaning depends on the optional arguments.
A named list with the following elements:
points
: A num_points
x num_dims
matrix with the generated points for
all clusters.
clusters
: A num_points
factor vector indicating which cluster
each point in points
belongs to.
projections
: A num_points
x num_dims
matrix with the point
projections on the cluster-supporting lines.
sizes
: A num_clusters
x 1 vector with the number of points in
each cluster.
centers
: A num_clusters
x num_dims
matrix with the
coordinates of the cluster centers.
directions
: A num_clusters
x num_dims
matrix with the final
direction of each cluster-supporting line.
angles
: A num_clusters
x 1 vector with the angles between the
cluster-supporting lines and the main direction.
lengths
: A num_clusters
x 1 vector with the lengths of the
cluster-supporting lines.
This function is stochastic. For reproducibility set a PRNG seed with set.seed.
# 2D example
x <- clugen(2, 5, 1000, c(1, 3), 0.5, c(10, 10), 8, 1.5, 2)
graphics::plot(x$points, col = x$clusters, xlab = "x", ylab = "y", asp = 1)
# 3D example
x <- clugen(3, 5, 1000, c(2, 3, 4), 0.5, c(15, 13, 14), 7, 1, 2)