fs.pca {mt} | R Documentation |
Feature selection using PCA loadings.
fs.pca(x,thres=0.8, ...)
x |
A data frame or matrix of data set. |
thres |
The threshold of the cumulative percentage of PC's explained variances. |
... |
Additional arguments to |
Since PCA loadings is a matrix with respect to PCs, the Mahalanobis distance of loadings is applied to select the features. (Other ways, for example, the sum of absolute values of loadings, or squared root of loadings, can be used.)
It should be noticed that this feature selection method is unsupervised.
A list with components:
fs.rank |
A vector of feature ranking scores. |
fs.order |
A vector of feature order from best to worst. |
stats |
A vector of measurements. |
Wanchang Lin
## prepare data set
data(abr1)
cls <- factor(abr1$fact$class)
dat <- abr1$pos
## dat <- abr1$pos[,110:1930]
## fill zeros with NAs
dat <- mv.zene(dat)
## missing values summary
mv <- mv.stats(dat, grp=cls)
mv ## View the missing value pattern
## filter missing value variables
## dim(dat)
dat <- dat[,mv$mv.var < 0.15]
## dim(dat)
## fill NAs with mean
dat <- mv.fill(dat,method="mean")
## log transformation
dat <- preproc(dat, method="log10")
## select class "1" and "2" for feature ranking
ind <- grepl("1|2", cls)
mat <- dat[ind,,drop=FALSE]
mat <- as.matrix(mat)
grp <- cls[ind, drop=TRUE]
## feature selection by PCA
res <- fs.pca(dat)
names(res)