residualize {rjaf} | R Documentation |
This function employs random forests and cross-validation to residualize outcomes following Wu and Gagnon-Bartsch (2018). That is, predicted outcomes resulting from random forests are subtracted from the original outcomes. Doing so helps in adjusting for small imbalanaces in baseline covariates and removing part of the variation in outcomes common across treatment arms
residualize(data, y, vars, nfold = 5, fun.rf = "ranger")
data |
input data used for training and estimation, where each row corresponds to an individual and columns contain information on treatments, covariates, probabilities of treatment assignment, and observed outcomes. |
y |
a character string denoting the column name of outcomes. |
vars |
a vector of character strings denoting the column names of covariates. |
nfold |
number of folds in cross-validation. The default value is 5. |
fun.rf |
a character string specifying which random forest package to use.
Two options are |
data for training and estimation with residualized outcomes.
Wu, Edward and Johann A Gagnon-Bartsch (2018). The LOOP Estimator: Adjusting
for Covariates in Randomized Experiments. Evaluation Review, 42(4):458–488.
data(Example_data)
library(dplyr)
library(magrittr)
Example_trainest <- Example_data %>% slice_sample(n = floor(0.5 * nrow(Example_data)))
y <- "Y"
vars <- paste0("X", 1:3)
Example_resid <- residualize(Example_trainest, y, vars, nfold = 5, fun.rf = "ranger")