hi_est {causaldrf} | R Documentation |
This function estimates the GPS function and estimates the ADRF. The GPS score is based on different treatment models. The treatment is linearly related to Xs.
hi_est(Y,
treat,
treat_formula,
outcome_formula,
data,
grid_val,
treat_mod,
link_function,
...)
Y |
is the the name of the outcome variable contained in |
treat |
is the name of the treatment variable contained in
|
treat_formula |
an object of class "formula" (or one that can be
coerced to that class) that regresses |
outcome_formula |
is the formula used for fitting the outcome surface. gps is one of the independent variables to use in the outcome_formula. ie.
or a variation
of this. Use |
data |
is a dataframe containing |
grid_val |
contains the treatment values to be evaluated. |
treat_mod |
a description of the error distribution to be used in the
model for treatment. Options include: |
link_function |
For |
... |
additional arguments to be passed to the outcome lm() function. |
Hirano (2004) (HI) introduced this imputation-type method that includes a GPS component. The idea is to fit a parametric observable (outcome) model, which includes the estimated GPS as a covariate, to impute missing potential outcomes.
The method requires several steps. First, a model is used to relate
treatment to the recorded covariates. For example,
T_i|\textbf{X}_i \sim \mathcal{N}(\textbf{X}_i^T \boldsymbol{\beta}, \sigma^2)
and then estimate the \boldsymbol{\beta}
parameters.
Next, the GPS for each unit is estimated
\hat{R}_i(t) = \frac{1}{\sqrt{2 \pi \hat{\sigma}^2} } e^{-\frac{(t - \textbf{X}_i^T \boldsymbol{\hat{\beta}})^2}{2 \hat{\sigma}^2}}
These GPS estimates are used in the outcome or observable model.
The outcome is modeled as a function of T_i
and \hat{R}_i
parametrically. For example,
E[Y_i | T_i, R_i] = \alpha_0 + \alpha_1 T_i + \alpha_2 T_i^2 + \alpha_3 \hat{R}_i + \alpha_4 \hat{R}_i^2 + \alpha_5 \hat{R}_i\cdot T_i
After collecting the estimated parameters in the outcome and treatment
models, plug-in the treatment values into the model to estimate the
missing potential outcomes of each individual at that treatment level.
For example, if we plug in T_i = t
into the estimated models, then
each unit will have a potential outcome estimated at treatment level
T_i= t
.
\hat{Y}_i(t) = \hat{\alpha}_0 + \hat{\alpha}_1 t + \hat{\alpha}_2 t^2 + \hat{\alpha}_3 \hat{R}_i(t) + \hat{\alpha}_4 \hat{R}_i^2(t) + \hat{\alpha}_5 \hat{R}_i(t) \cdot t
The next step is to aggregate these estimated potential outcomes
to get an average treatment effect at dose level T_i = t
.
The mean outcome at dose-level T_i = t
is given by:
\hat{\mu}(t) = \frac{1}{N}\sum_i^N \hat{\alpha}_0 + \hat{\alpha}_1 t + \hat{\alpha}_2 t^2 + \hat{\alpha}_3 \hat{R}_i(t) + \hat{\alpha}_4 \hat{R^2}_i(t) + \hat{\alpha}_5 \hat{R}_i(t) \cdot t
Different treatment levels are plugged into the previous equation
to estimate the missing potential outcomes. If many t
values
are evaluated, then it is possible to trace out an ADRF.
hi_est
returns an object of class "causaldrf",
a list that contains the following components:
param |
parameter estimates for a hi fit. |
t_mod |
the result of the treatment model fit. |
out_mod |
the result of the outcome model fit. |
call |
the matched call. |
Schafer, J.L., Galagate, D.L. (2015). Causal inference with a continuous treatment and outcome: alternative estimators for parametric dose-response models. Manuscript in preparation.
Hirano, Keisuke, Imbens, Guido W (2004). The propensity score with continuous treatments. Applied Bayesian modeling and causal inference from incomplete-data perspectives.
nw_est
, iw_est
, hi_est
, gam_est
,
add_spl_est
,
bart_est
, etc. for other estimates.
t_mod
, overlap_fun
to prepare the data
for use in the different estimates.
## Example from Schafer (2015).
example_data <- sim_data
hi_list <- hi_est(Y = Y,
treat = T,
treat_formula = T ~ B.1 + B.2 + B.3 + B.4 + B.5 + B.6 + B.7 + B.8,
outcome_formula = Y ~ T + I(T^2) + gps + I(gps^2) + T * gps,
data = example_data,
grid_val = seq(8, 16, by = 1),
treat_mod = "Normal")
sample_index <- sample(1:1000, 100)
plot(example_data$T[sample_index],
example_data$Y[sample_index],
xlab = "T",
ylab = "Y",
main = "hi estimate")
lines(seq(8, 16, by = 1),
hi_list$param,
lty = 2,
lwd = 2,
col = "blue")
legend('bottomright',
"hi estimate",
lty=2,
lwd = 2,
col = "blue",
bty='Y',
cex=1)
rm(example_data, hi_list, sample_index)
## Example from van der Wal, Willem M., and Ronald B. Geskus. (2011)
#Simulate data with continuous confounder and outcome, binomial exposure.
#Marginal causal effect of exposure on outcome: 10.
n <- 1000
simdat <- data.frame(l = rnorm(n, 10, 5))
a.lin <- simdat$l - 10
pa <- exp(a.lin)/(1 + exp(a.lin))
simdat$a <- rbinom(n, 1, prob = pa)
simdat$y <- 10*simdat$a + 0.5*simdat$l + rnorm(n, -10, 5)
simdat[1:5,]
temp_hi <- hi_est(Y = y,
treat = a,
treat_formula = a ~ l,
outcome_formula = y ~ gps,
data = simdat,
grid_val = c(0, 1),
treat_mod = "Binomial",
link_function = "logit")
temp_hi[[1]] # estimated coefficients