autotune_missRanger {NADIA} | R Documentation |
Perform imputation using missRenger form missRegnger package.
Description
Function use missRenger package for data imputation. Function use OBBerror (more in missForest documentation) to perform random search.
Usage
autotune_missRanger(
df,
percent_of_missing = NULL,
maxiter = 10,
random.seed = 123,
mtry = NULL,
num.trees = 500,
verbose = FALSE,
col_0_1 = FALSE,
out_file = NULL,
pmm.k = 5,
optimize = TRUE,
iter = 10
)
Arguments
df |
data.frame. Df to impute with column names and without target column. |
percent_of_missing |
numeric vector. Vector contatining percent of missing data in columns for example c(0,1,0,0,11.3,..) |
maxiter |
maximum number of iteration for missRanger algorithm |
random.seed |
random seed use in imputation |
mtry |
sample fraction use by missRanger. This param isn't optimized automatically. If NULL default value from ranger package will be used. |
num.trees |
number of trees. If optimize == TRUE. Param set seq(10,num.trees,iter) will be used. |
verbose |
If FALSE function doesn't print on console. |
col_0_1 |
decide if add bonus column informing where imputation been done. 0 - value was in dataset, 1 - value was imputed. Default False. |
out_file |
Output log file location if file already exists log message will be added. If NULL no log will be produced. |
pmm.k |
Number of candidate non-missing values to sample from in the predictive meanmatching step. 0 to avoid this step. If optimize == TRUE param set sample(1:pmm.k,iter) will be used. If pmm.k==0 missRanger == missForest. |
optimize |
If TRUE inside optimization will be performed. |
iter |
Number of iteration for a random search. |
Value
Return data.frame with imputed values.
Author(s)
Michael Mayer (2019).
References
Michael Mayer (2019). missRanger: Fast Imputation of Missing Values. R package version 2.1.0. https://CRAN.R-project.org/package=missRanger
Examples
raw_data <- data.frame(
a = as.factor(sample(c("red", "yellow", "blue", NA), 1000, replace = TRUE)),
b = as.integer(1:1000),
c = as.factor(sample(c("YES", "NO", NA), 1000, replace = TRUE)),
d = runif(1000, 1, 10),
e = as.factor(sample(c("YES", "NO"), 1000, replace = TRUE)),
f = as.factor(sample(c("male", "female", "trans", "other", NA), 1000, replace = TRUE)))
# Prepering col_type
col_type <- c("factor", "integer", "factor", "numeric", "factor", "factor")
percent_of_missing <- 1:6
for (i in percent_of_missing) {
percent_of_missing[i] <- 100 * (sum(is.na(raw_data[, i])) / nrow(raw_data))
}
imp_data <- autotune_missRanger(raw_data[1:100,], percent_of_missing, optimize = FALSE)
# Check if all missing value was imputed
sum(is.na(imp_data)) == 0
# TRUE