bootclustrange {WeightedCluster} | R Documentation |
Cluster Quality Indices estimation by subsampling
Description
bootclustrange
estimates the quality of the clustering based on subsamples of the data to avoid computational overload.
Usage
bootclustrange(object, seqdata, seqdist.args = list(method = "LCS"),
R = 100, sample.size = 1000, parallel = FALSE,
progressbar = FALSE, sampling = "clustering",
strata = NULL)
## S3 method for class 'bootclustrange'
plot(x, stat = "noCH", legendpos = "bottomright",
norm = "none", withlegend = TRUE, lwd = 1,
col = NULL, ylab = "Indicators",
xlab = "N clusters", conf.int = 0.95,
ci.method = "perc", ci.alpha = 0.3,
line = "median", ...)
## S3 method for class 'bootclustrange'
print(x, digits = 2, bootstat = c("mean"), ...)
Arguments
object |
A |
seqdata |
State sequence object of class |
seqdist.args |
List of arguments passed to |
R |
Numeric. The number of subsamples to use. |
sample.size |
Numeric. The size of the subsamples, values between 1000 and 10 000 are recommended. |
parallel |
Logical. Whether to initialize the parallel processing of the |
progressbar |
Logical. Whether to initialize a progressbar using the |
sampling |
Character. The sampling procedure to be used: |
strata |
An optional stratification variable. |
x |
A |
stat |
Character. The list of statistics to plot or "noCH" to plot all statistics except "CH" and "CHsq" or "all" for all statistics. See |
legendpos |
Character. legend position, see |
norm |
Character. Normalization method of the statistics can be one of "none" (no normalization), "range" (given as (value -min)/(max-min), "zscore" (adjusted by mean and standard deviation) or "zscoremed" (adjusted by median and median of the difference to the median). |
withlegend |
Logical. If |
lwd |
Numeric. Line width, see |
col |
A vector of line colors, see |
xlab |
x axis label. |
ylab |
y axis label. |
conf.int |
Confidence to build the confidence interval (default: 0.95). |
ci.method |
Method used to build the confidence interval (only if bootstrap has been used, see R above). One of "none" (do not plot confidence interval), "norm" (based on normal approximation), "perc" (default, based on percentile).) |
ci.alpha |
alpha color value used to plot the interval. |
line |
Which value should be plotted by the line? One of "mean" (average over all bootstraps), "median"(default, median over all bootstraps). |
digits |
Number of digits to be printed. |
bootstat |
The summary statistic to use |
... |
Additionnal parameters passed to/from methods. |
Details
bootclustrange
estimates the quality of the clustering based on subsamples of the data to avoid computational overload. It randomly samples R
times sample.size
sequences from seqdata
using the sampling procedure defined by the sampling
arguments. In each subsample, a distance matrix is computed using the selected sequences and the seqdist.args
arguments and the cluster quality indices are then estimated using as.clustrange
.
The clustering can be specified either as a seqclararange
object or a data.frame
.
Value
A clustrange
object, see as.clustrange
with the bootrapped values.
References
Studer, M., R. Sadeghi and L. Tochon (2024). Sequence Analysis for Large Databases. LIVES Working Papers 104 doi:10.12682/lives.2296-1658.2024.104
See Also
See Also as.clustrange
for the list of cluster quality indices that are computed, and seqclararange
for example of use