clusterSP {sarp.snowprofile.alignment} | R Documentation |
Cluster snow profiles
Description
This function is the main gateway to sarp.snowprofile::snowprofile clustering.
Usage
clusterSP(
SPx = NULL,
k = 2,
type = c("hclust", "pam", "fanny", "kdba", "fast")[1],
distmat = NULL,
config = clusterSPconfig(type),
centers = "none",
keepSPx = TRUE,
keepDistmat = TRUE
)
Arguments
SPx |
a sarp.snowprofile::snowprofileSet to be clustered |
k |
number of desired cluster numbers |
type |
clustering type including |
distmat |
a precomputed distance matrix of class dist. This results in much faster clustering for |
config |
a list providing the necessary hyperparameters. Use clusterSPconfig functions for convenience! |
centers |
compute and return |
keepSPx |
append the snowprofileSet to the output? |
keepDistmat |
append the distmat to the output? |
Details
There are several clustering approaches that can be applied to snow profiles. Most rely on computing a pairwise distance matrix between all profiles in a snowprofileSet. Current implementations with this approach rely on existing R functions:
agglomerative hierarchical clustering stats::hclust
partitioning around medoids cluster::pam
fuzzy analysis clustering cluster::fanny
Since computing a pairwise distance matrix matrix can be slow, the recommended way of testing different number of clusters $k$ is precomputing a single distance matrix with the distanceSP function and providing it as an argument to clusterSP.
An alternate type of clustering known a k-dimensional barycentric averaging kdba is conceptually similar to kmeans but specifically adapted to snow profiles clusterSPkdba. That means that an initial clustering condition (which can be random or based on a 'sophisticated guess') is iteratively refined by assigning individual profiles to the most similar cluster and at the end of every iteration recomputing the cluster centroids. The cluster centroids are represented by the average snow profile of each cluster (see averageSP). Note that the results of kdba are sensitive to the initial conditions, which by default are estimated with the 'fast' method below.
And finally, a much faster 'fast' method is available that computes a pairwise distance matrix without aligning profiles, but instead based on summary statistics such as snow height, height of new snow, presence or absence of weak layers and crusts, etc. The 'fast' clustering approach uses the partitioning around medoids clustering approach with the 'fast' distance matrix.
More details here...
Value
a list of class clusterSP
containing:
-
clustering
: vector of integers (from 1:k) indicating the cluster to which each point is allocated -
id.med
: vector of indices for the medoid profiles of each cluster (if calculated) -
centroids
: snowprofileSet containing the centroid profile for each cluster (if calculated) -
tree
: object of class 'hclust' describing the tree output by hclust -
...
: all other outputs provided by the clustering algorithms (e.g., a membership matrix fromfanny.object
,pam.object
, iteration history from clusterSPkdba) -
type
: type of clustering as provided by input argument -
call
: a copy of the clusterSP function call -
SPx
: a copy of the input snowprofileSet (ifkeepSPx = TRUE
) -
distmat
: the pairwise distance matrix of class dist (ifkeepDistmat = TRUE
and a matrix has been provided or computed)
Author(s)
fherla shorton
See Also
clusterSPconfig, clusterSPcenters, clusterSPkdba, plot.clusterSP
Examples
this_example_runs_too_long <- TRUE
if (!this_example_runs_too_long) { # exclude from cran checks
## Cluster with SPgroup2, which contains deposition date and p_unstable
SPx <- SPgroup2
config <- clusterSPconfig(simType = 'wsum_scaled', ddate = T, pwls = T)
## Hierarchical clustering with k = 2
cl_hclust <- clusterSP(SPx, k = 2, type = 'hclust', config = config)
plot(cl_hclust)
## Precompute a distance matrix and cluster with PAM for k = 2 and 3
distmat <- do.call('distanceSP', c(list(SPx), config$args_distance))
cl_pam2 <- clusterSP(SPx, k = 2, type = 'pam', config = config, distmat = distmat)
cl_pam3 <- clusterSP(SPx, k = 3, type = 'pam', config = config, distmat = distmat)
print(cl_pam2$clustering)
print(cl_pam3$clustering)
## kdba clustering
config_kdba <- clusterSPconfig(simType = 'layerwise', type = 'kdba')
cl_kdba <- clusterSP(SPx = SPgroup2, k = 2, type = 'kdba', config = config_kdba)
plot(cl_kdba)
}