iof {nomclust} | R Documentation |
A function for calculation of a proximity (dissimilarity) matrix based on the IOF similarity measure.
iof(data)
data |
A data.frame or a matrix with cases in rows and variables in colums. |
The IOF (Inverse Occurrence Frequency) measure was originally constructed for the text mining tasks, see (Sparck-Jones, 1972), later, it was adjusted for categorical variables, see (Boriah et al., 2008). The measure assigns higher weight to mismatches on less frequent values and vice versa.
The function returns an object of class "dist".
Zdenek Sulc.
Contact: zdenek.sulc@vse.cz
Boriah S., Chandola V., Kumar V. (2008). Similarity measures for categorical data: A comparative evaluation.
In: Proceedings of the 8th SIAM International Conference on Data Mining, SIAM, p. 243-254.
Spark-Jones K. (1972). A statistical interpretation of term specificity and its application in retrieval.
In Journal of Documentation, 28(1), 11-21. Later: Journal of Documentation, 60(5) (2002), 493-502.
eskin
,
good1
,
good2
,
good3
,
good4
,
lin
,
lin1
,
of
,
sm
,
ve
,
vm
.
# sample data data(data20) # dissimilarity matrix calculation prox.iof <- iof(data20)