autotune_Amelia {NADIA} | R Documentation |
Perform imputation using Amelia package and EMB algorithm.
Description
Function use EMB (Expectation-Maximization with Bootstrapping ) to impute missing data. Function performance is highly depend from data structure and chosen parameters.
Usage
autotune_Amelia(
df,
col_type = NULL,
percent_of_missing = NULL,
col_0_1 = FALSE,
parallel = TRUE,
polytime = NULL,
splinetime = NULL,
intercs = FALSE,
empir = NULL,
verbose = FALSE,
return_one = TRUE,
m = 3,
out_file = NULL
)
Arguments
df |
data.frame. Df to impute with column names and without target column. |
col_type |
character vector. Vector containing column type names. |
percent_of_missing |
numeric vector. Vector contatining percent of missing data in columns for example c(0,1,0,0,11.3,..) |
col_0_1 |
Decaid if add bonus column informing where imputation been done. 0 - value was in dataset, 1 - value was imputed. Default False. (Works only for returning one dataset). |
parallel |
If true parallel calculation is used. |
polytime |
parameter pass to amelia function |
splinetime |
parameter pass to amelia finction |
intercs |
parameter pass to amleia function |
empir |
parameter pass to amelia function as empir in Amelia == empir*nrow(df). If empir dont set empir=nrow(df)*0.015. |
verbose |
If true function will print on console. |
return_one |
Decide if one dataset or amelia object will be returned. |
m |
Number of datasets generated by amelia. If retrun_one=TRUE first dataset will be given. |
out_file |
Output log file location if file already exists log message will be added. If NULL no log will be produced. |
Value
Return one data.frame with imputed values or amelia object.
Author(s)
James Honaker, Gary King, Matthew Blackwell (2011).
References
James Honaker, Gary King, Matthew Blackwell (2011). Amelia II: A Program for Missing Data. Journal of Statistical Software, 45(7), 1-47. URL https://www.jstatsoft.org/v45/i07/.
Examples
{
raw_data <- data.frame(
a = as.factor(sample(c("red", "yellow", "blue", NA), 1000, replace = TRUE)),
b = as.integer(1:1000),
c = as.factor(sample(c("YES", "NO", NA), 1000, replace = TRUE)),
d = runif(1000, 1, 10),
e = as.factor(sample(c("YES", "NO"), 1000, replace = TRUE)),
f = as.factor(sample(c("male", "female", "trans", "other", NA), 1000, replace = TRUE)))
# Prepering col_type
col_type <- c("factor", "integer", "factor", "numeric", "factor", "factor")
percent_of_missing <- 1:6
for (i in percent_of_missing) {
percent_of_missing[i] <- 100 * (sum(is.na(raw_data[, i])) / nrow(raw_data))
}
imp_data <- autotune_Amelia(raw_data, col_type, percent_of_missing,parallel = FALSE)
# Check if all missing value was imputed
sum(is.na(imp_data)) == 0
# TRUE
}