cv_filter {scorecardModelUtils} | R Documentation |
The function returns a list of variables that can be dropped because of high correlation with another variable, based on Cramer's V and IV. If V1 and V2 have a Cramer's V value more than a user defined threshold, the variable with lower IV will be recommended to be dropped by this function. The variable which got dropped wont be considered for dropping any more variables.
cv_filter(cv_table, iv_table, threshold)
cv_table |
dataframe of class cv_table with three columns - var_1, var_2, cv_value |
iv_table |
dataframe of class iv_table with two columns - Variable_name, iv |
threshold |
Cramers' V value above which one of the variable will be recommended to be dropped |
An object of class "cv_filter" is a list containing the following components:
retain_var_list |
list of variables remaining post CV filter |
dropped_var_list |
list of variables that can be dropped based on CV filter |
dropped_var_tab |
CV correlation value for dropped variables as a dataframe |
threshold |
threshold CV value used as input parameter |
Arya Poddar <aryapoddar290990@gmail.com>
data <- iris
suppressWarnings(RNGversion('3.5.0'))
set.seed(11)
data$Y <- sample(0:1,size=nrow(data),replace=TRUE)
cv_tab_list <- cv_table(data, c("Species", "Sepal.Length"))
cv_tab <- cv_tab_list$cv_val_tab
x <- c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width")
iv_table_list <- iv_table(base = data,target = "Y",num_var_name = x,cat_var_name = "Species")
iv_tab <- iv_table_list$iv_table
cv_filter_list <- cv_filter(cv_table = cv_tab,iv_table = iv_tab,threshold = 0.5)
cv_filter_list$retain_var_list
cv_filter_list$dropped_var_list
cv_filter_list$dropped_var_tab
cv_filter_list$threshold