knncatimputeLarge {scrime} | R Documentation |
Imputes missing values in a high-dimensional matrix composed of categorical variables
using k
Nearest Neighbors.
knncatimputeLarge(data, mat.na = NULL, fac = NULL, fac.na = NULL,
nn = 3, distance = c("smc", "cohen", "snp1norm", "pcc"),
n.num = 100, use.weights = TRUE, verbose = FALSE)
data |
a numeric matrix consisting of integers between 1 and Each row of |
mat.na |
a numeric matrix containing missing values. Must have the same number of
columns as |
fac |
a numeric or character vector of length |
fac.na |
a numeric or character vector of length |
nn |
an integer specifying |
distance |
character string naming the distance measure used in |
n.num |
an integer giving the number of rows of |
use.weights |
should weighted |
verbose |
should more information about the progress of the imputation be printed? |
If mat.na = NULL
, then a matrix of the same size as data
in which the missing
values have been replaced. If mat.na
has been specified, then a matrix of the same size as
mat.na
in which the missing values have been replaced.
While in knncatimpute
all variable/rows are considered when replacing
missing values, knncatimputeLarge
only considers the rows with no missing values
when searching for the k
nearest neighbors.
Holger Schwender, holger.schwender@udo.edu
Schwender, H. and Ickstadt, K.\ (2008). Imputing Missing Genotypes with k
Nearest Neighbors.
Technical Report, SFB 475, Department of Statistics, University of Dortmund. Appears soon.
knncatimpute
, gknn
, smc
, pcc
## Not run:
# Generate a data set consisting of 100 columns and 2000 rows (actually,
# knncatimputeLarge is made for much larger data sets), where the values
# are randomly drawn from the integers 1, 2, and 3.
# Afterwards, remove 200 of the observations randomly.
mat <- matrix(sample(3, 200000, TRUE), 2000)
mat[sample(200000, 20)] <- NA
# Apply knncatimputeLarge to mat to remove the missing values.
mat2 <- knncatimputeLarge(mat)
sum(is.na(mat))
sum(is.na(mat2))
# Now assume that the first 100 rows belong to SNPs from chromosome 1,
# the second 100 rows to SNPs from chromosome 2, and so on.
chromosome <- rep(1:20, e = 100)
# Apply knncatimputeLarge to mat chromosomewise, i.e. only consider
# the SNPs that belong to the same chromosome when replacing missing
# genotypes.
mat4 <- knncatimputeLarge(mat, fac = chromosome)
## End(Not run)