autotune_VIM_Irmi {NADIA} | R Documentation |
Perform imputation using VIM package and irmi function
Description
Function use IRMI (Iterative robust model-based imputation ) to impute missing data.
Usage
autotune_VIM_Irmi(
df,
col_type = NULL,
percent_of_missing = NULL,
eps = 5,
maxit = 100,
step = FALSE,
robust = FALSE,
init.method = "kNN",
force = FALSE,
col_0_1 = FALSE,
out_file = NULL
)
Arguments
df |
data.frame. Df to impute with column names and without target column. |
col_type |
character vector. Vector containing column type names. |
percent_of_missing |
numeric vector. Vector contatining percent of missing data in columns for example c(0,1,0,0,11.3,..) |
eps |
threshold for convergency |
maxit |
maximum number of iterations |
step |
stepwise model selection is applied when the parameter is set to TRUE |
robust |
if TRUE, robust regression methods will be applied (it's impossible to set step=TRUE and robust=TRUE at the same time) |
init.method |
Method for initialization of missing values (kNN or median) |
force |
if TRUE, the algorithm tries to find a solution in any case, possible by using different robust methods automatically. (should be set FALSE for simulation) |
col_0_1 |
Decaid if add bonus column informing where imputation been done. 0 - value was in dataset, 1 - value was imputed. Default False. (Works only for returning one dataset). |
out_file |
Output log file location if file already exists log message will be added. If NULL no log will be produced. |
Details
Function can work with various different times depending on data size and structure. In some cases when selected param wouldn't work function try to run on default. Most important param for both quality and reliability its eps.
Value
Return one data.frame with imputed values.
Author(s)
Alexander Kowarik, Matthias Templ (2016) doi:10.18637/jss.v074.i07
References
Alexander Kowarik, Matthias Templ (2016). Imputation with the R Package VIM. Journal of Statistical Software, 74(7), 1-16. doi:10.18637/jss.v074.i07
Examples
{
raw_data <- data.frame(
a = as.factor(sample(c("red", "yellow", "blue", NA), 1000, replace = TRUE)),
b = as.integer(1:1000),
c = as.factor(sample(c("YES", "NO", NA), 1000, replace = TRUE)),
d = runif(1000, 1, 10),
e = as.factor(sample(c("YES", "NO"), 1000, replace = TRUE)),
f = as.factor(sample(c("male", "female", "trans", "other", NA), 1000, replace = TRUE)))
# Prepering col_type
col_type <- c("factor", "integer", "factor", "numeric", "factor", "factor")
percent_of_missing <- 1:6
for (i in percent_of_missing) {
percent_of_missing[i] <- 100 * (sum(is.na(raw_data[, i])) / nrow(raw_data))
}
imp_data <- autotune_VIM_Irmi(raw_data, col_type, percent_of_missing)
# Check if all missing value was imputed
sum(is.na(imp_data)) == 0
# TRUE
}