hmm.clust {DBHC} | R Documentation |
Implementation of the DBHC algorithm, an HMM clustering algorithm that finds a mixture of discrete-output HMMs. The algorithm uses heuristics based on BIC to search for the optimal number of hidden states in each HMM and the optimal number of clusters.
hmm.clust(
sequences,
id = NULL,
smoothing = 1e-04,
eps = 0.001,
init.size = 2,
alphabet = NULL,
K.max = NULL,
log_space = FALSE,
print = FALSE,
seed.size = 3
)
sequences |
An |
id |
A vector with ids that identify the sequences in |
smoothing |
Smoothing parameter for absolute discounting in
|
eps |
A threshold epsilon for counting parameters in
|
init.size |
The number of HMM states in an initial HMM. |
alphabet |
The alphabet of output labels, if not provided alphabet is
taken from |
K.max |
Maximum number of clusters, if not provided algorithm searches for the optimal number itself. |
log_space |
Logical, parameter provided to
|
print |
Logical, whether to print intermediate steps or not. |
seed.size |
Seed size, the number of sequences to be selected for a seed |
A list with components:
sequences
An
stslist
object of sequences with discrete observations.
id
A vector with ids that identify the sequences in
sequences
.
cluster
A vector with found cluster memberships for the sequences.
partition
A list object with
the partition, a mixture of HMMs. Each element in the list is an hmm
object.
memberships
A matrix with cluster memberships for each sequence.
n.clusters
Numerical, the found number of clusters.
sizes
A vector with the number of HMM states for each cluster model.
bic
A vector with the BICs for each cluster model.
## Simulated data
library(seqHMM)
output.labels <- c("H", "T")
# HMM 1
states.1 <- c("A", "B", "C")
transitions.1 <- matrix(c(0.8,0.1,0.1,0.1,0.8,0.1,0.1,0.1,0.8), nrow = 3)
rownames(transitions.1) <- states.1
colnames(transitions.1) <- states.1
emissions.1 <- matrix(c(0.5,0.75,0.25,0.5,0.25,0.75), nrow = 3)
rownames(emissions.1) <- states.1
colnames(emissions.1) <- output.labels
initials.1 <- c(1/3,1/3,1/3)
# HMM 2
states.2 <- c("A", "B")
transitions.2 <- matrix(c(0.75,0.25,0.25,0.75), nrow = 2)
rownames(transitions.2) <- states.2
colnames(transitions.2) <- states.2
emissions.2 <- matrix(c(0.8,0.6,0.2,0.4), nrow = 2)
rownames(emissions.2) <- states.2
colnames(emissions.2) <- output.labels
initials.2 <- c(0.5,0.5)
# Simulate
hmm.sim.1 <- simulate_hmm(n_sequences = 100,
initial_probs = initials.1,
transition_probs = transitions.1,
emission_probs = emissions.1,
sequence_length = 25)
hmm.sim.2 <- simulate_hmm(n_sequences = 100,
initial_probs = initials.2,
transition_probs = transitions.2,
emission_probs = emissions.2,
sequence_length = 25)
sequences <- rbind(hmm.sim.1$observations, hmm.sim.2$observations)
n <- nrow(sequences)
# Clustering algorithm
id <- paste0("K-", 1:n)
rownames(sequences) <- id
sequences <- sequences[sample(1:n, n),]
res <- hmm.clust(sequences, id = rownames(sequences))
#############################################################################
## Swiss Household Data
data("biofam", package = "TraMineR")
# Clustering algorithm
new.alphabet <- c("P", "L", "M", "LM", "C", "LC", "LMC", "D")
sequences <- seqdef(biofam[,10:25], alphabet = 0:7, states = new.alphabet)
## Not run:
res <- hmm.clust(sequences)
# Heatmaps
cluster <- 1 # display heatmaps for cluster 1
transition.heatmap(res$partition[[cluster]]$transition_probs,
res$partition[[cluster]]$initial_probs)
emission.heatmap(res$partition[[cluster]]$emission_probs)
## End(Not run)
## A smaller example, which takes less time to run
subset <- sequences[sample(1:nrow(sequences), 20, replace = FALSE),]
# Clustering algorithm, limiting number of clusters to 2
res <- hmm.clust(subset, K.max = 2)
# Number of clusters
print(res$n.clusters)
# Table of cluster memberships
table(res$memberships[,"cluster"])
# BIC for each number of clusters
print(res$bic)
# Heatmaps
cluster <- 1 # display heatmaps for cluster 1
transition.heatmap(res$partition[[cluster]]$transition_probs,
res$partition[[cluster]]$initial_probs)
emission.heatmap(res$partition[[cluster]]$emission_probs)