autotune_VIM_hotdeck {NADIA}R Documentation

Hot-Deck imputation using VIM package.

Description

Function perform hotdeck function from VIM package. Any tunable parameters aren't available in this algorithm.

Usage

autotune_VIM_hotdeck(
  df,
  percent_of_missing = NULL,
  col_0_1 = FALSE,
  out_file = NULL
)

Arguments

df

data.frame. Df to impute with column names and without target column.

percent_of_missing

numeric vector. Vector contatining percent of missing data in columns for example c(0,1,0,0,11.3,..)

col_0_1

decide if add bonus column informing where imputation been done. 0 - value was in dataset, 1 - value was imputed. Default False.

out_file

Output log file location if file already exists log message will be added. If NULL no log will be produced.

Value

Return data.frame with imputed values.

Author(s)

Alexander Kowarik, Matthias Templ (2016) doi:10.18637/jss.v074.i07

References

Alexander Kowarik, Matthias Templ (2016). Imputation with the R Package VIM. Journal of Statistical Software, 74(7), 1-16. doi:10.18637/jss.v074.i07

Examples

{
  raw_data <- data.frame(
    a = as.factor(sample(c("red", "yellow", "blue", NA), 1000, replace = TRUE)),
    b = as.integer(1:1000),
    c = as.factor(sample(c("YES", "NO", NA), 1000, replace = TRUE)),
    d = runif(1000, 1, 10),
    e = as.factor(sample(c("YES", "NO"), 1000, replace = TRUE)),
    f = as.factor(sample(c("male", "female", "trans", "other", NA), 1000, replace = TRUE)))

  # Prepering col_type
  col_type <- c("factor", "integer", "factor", "numeric", "factor", "factor")

  percent_of_missing <- 1:6
  for (i in percent_of_missing) {
    percent_of_missing[i] <- 100 * (sum(is.na(raw_data[, i])) / nrow(raw_data))
  }


  imp_data <- autotune_VIM_hotdeck(raw_data, percent_of_missing)

  # Check if all missing value was imputed
  sum(is.na(imp_data)) == 0
  # TRUE
}

[Package NADIA version 0.4.2 Index]