autotune_VIM_regrImp {NADIA} | R Documentation |
Perform imputation using VIM package and regressionImp function.
Description
Function use Regression models to impute missing data.
Usage
autotune_VIM_regrImp(
df,
col_type = NULL,
percent_of_missing = NULL,
col_0_1 = FALSE,
robust = FALSE,
mod_cat = FALSE,
use_imputed = FALSE,
out_file = NULL
)
Arguments
df |
data.frame. Df to impute with column names and without target column. |
col_type |
Character vector with types of columns. |
percent_of_missing |
numeric vector. Vector contatining percent of missing data in columns for example c(0,1,0,0,11.3,..) |
col_0_1 |
Decaid if add bonus column informing where imputation been done. 0 - value was in dataset, 1 - value was imputed. Default False. (Works only for returning one dataset). |
robust |
TRUE/FALSE if robust regression should be used. |
mod_cat |
TRUE/FALSE if TRUE for categorical variables the level with the highest prediction probability is selected, otherwise it is sampled according to the probabilities. |
use_imputed |
TRUE/FALSE if TURE already imputed columns will be used to impute another. |
out_file |
Output log file location if file already exists log message will be added. If NULL no log will be produced. |
Details
Function impute one column per iteration to allow more control of imputation. All columns with missing values can be imputed with different formulas. For every new column to imputation one of four formula is used
1. col to impute ~ all columns without missing
2. col to impute ~ all numeric columns without missing
3. col to impute ~ first of columns without missing
4. col to impute ~ first of numeric columns without missing
For example, if formula 1 and 2 can't be used algorithm will try with formula 3. If all formula can't be used function will be stoped and error form tries with formula 4 or 3 presented. In some case, setting use_imputed on TRUE can solve this problem but in general its lower quality of imputation.
Value
Return one data.frame with imputed values.
Author(s)
Alexander Kowarik, Matthias Templ (2016) doi:10.18637/jss.v074.i07
References
Alexander Kowarik, Matthias Templ (2016). Imputation with the R Package VIM. Journal of Statistical Software, 74(7), 1-16. doi:10.18637/jss.v074.i07
Examples
{
raw_data <- data.frame(
a = as.factor(sample(c("red", "yellow", "blue", NA), 1000, replace = TRUE)),
b = as.integer(1:1000),
c = as.factor(sample(c("YES", "NO", NA), 1000, replace = TRUE)),
d = runif(1000, 1, 10),
e = as.factor(sample(c("YES", "NO"), 1000, replace = TRUE)),
f = as.factor(sample(c("male", "female", "trans", "other", NA), 1000, replace = TRUE)))
# Prepering col_type
col_type <- c("factor", "integer", "factor", "numeric", "factor", "factor")
percent_of_missing <- 1:6
for (i in percent_of_missing) {
percent_of_missing[i] <- 100 * (sum(is.na(raw_data[, i])) / nrow(raw_data))
}
imp_data <- autotune_VIM_regrImp(raw_data, col_type, percent_of_missing)
# Check if all missing value was imputed
sum(is.na(imp_data)) == 0
# TRUE
}