match.kmeans {matchFeat} | R Documentation |
This function matches collections of feature vectors in a one-to-one fashion using a k
-means-like method.
match.kmeans(x, unit = NULL, w = NULL, method = c("hungarian", "bruteforce"),
control = list())
x |
data: matrix of dimensions |
unit |
integer (= number of units) or vector mapping rows of |
w |
weights for the (squared Euclidean) loss function. Can be specified as single positive number, |
method |
method for linear assignment problem: |
control |
optional list of tuning parameters |
Given a set of n
units or datasets, each having m
unlabeled feature vectors, the one-to-one matching problem is to find a set of n
labels that produce the best match of feature vectors across units. The objective function to minimize is the sum of (weighted) squared Euclidean distances between all pairs of feature vectors having the same label. This amounts to minimizing the sum of the within-label variances.
The sample means and sample covariances of the matched feature vectors are calculated as a post-processing step.
If x
is a matrix, the rows should be sorted by increasing unit label and unit
should be a nondecreasing sequence of integers, for example (1,...,1,2,...,2,...,n,...,n)
with each integer 1,...,n
replicated m
times.
The argument w
can be specified as a vector of positive numbers (will be recycled to length p
if needed) or as a positive definite matrix of size (p,p)
.
The optional argument control
is a list with three fields: sigma
, starting point for the optimization ((m,n)
matrix of permutations; maxit
, maximum number of iterations; and equal.variance
, logical value that specifies whether the returned sample covariance matrices V
for matched features should be equal between labels/classes (TRUE) or label-specific (FALSE, default).
A list of class matchFeat
with components
best set of permutations for feature vectors ((m,n)
matrix)
associated clusters (= inverse permutations)
minimum objective value
sample mean for each class/label ((p,m)
matrix)
sample covariance for each class/label ((p,m)
matrix
function call
Degras (2022) "Scalable feature matching across large data collections."
doi:10.1080/10618600.2022.2074429
https://en.wikipedia.org/wiki/K-means_clustering
https://en.wikipedia.org/wiki/Assignment_problem
https://en.wikipedia.org/wiki/Hungarian_algorithm
match.2x
, match.bca
,
match.bca.gen
, match.gaussmix
,
match.rec
, match.template
## Generate data
m <- 3
n <- 10
p <- 5
mu <- matrix(rnorm(p*m),p,m)
sigma <- 0.1
x <- array(as.vector(mu) + rnorm(p*m*n,sigma), c(p,m,n))
## Match all feature vectors
result <- match.kmeans(x)
## Display results
result$objective # cost function
xmatched <- array(dim=dim(x))
## re-arranged (matched) feature vectors
for (i in 1:n){
xmatched[,,i] <- x[,result$sigma[,i],i]}