policy_eval {polle} | R Documentation |
policy_eval()
is used to estimate
the value of a given fixed policy
or a data adaptive policy (e.g. a policy
learned from the data). policy_eval()
is also used to estimate the average
treatment effect among the subjects who would
get the treatment under the policy.
policy_eval(
policy_data,
policy = NULL,
policy_learn = NULL,
g_functions = NULL,
g_models = g_glm(),
g_full_history = FALSE,
save_g_functions = TRUE,
q_functions = NULL,
q_models = q_glm(),
q_full_history = FALSE,
save_q_functions = TRUE,
target = "value",
type = "dr",
cross_fit_type = "pooled",
variance_type = "pooled",
M = 1,
future_args = list(future.seed = TRUE),
name = NULL
)
## S3 method for class 'policy_eval'
coef(object, ...)
## S3 method for class 'policy_eval'
IC(x, ...)
## S3 method for class 'policy_eval'
vcov(object, ...)
## S3 method for class 'policy_eval'
print(
x,
digits = 4L,
width = 35L,
std.error = TRUE,
level = 0.95,
p.value = TRUE,
...
)
## S3 method for class 'policy_eval'
summary(object, ...)
## S3 method for class 'policy_eval'
estimate(
x,
labels = get_element(x, "name", check_name = FALSE),
level = 0.95,
...
)
## S3 method for class 'policy_eval'
merge(x, y, ..., paired = TRUE)
## S3 method for class 'policy_eval'
x + ...
policy_data |
Policy data object created by |
policy |
Policy object created by |
policy_learn |
Policy learner object created by |
g_functions |
Fitted g-model objects, see nuisance_functions.
Preferably, use |
g_models |
List of action probability models/g-models for each stage
created by |
g_full_history |
If TRUE, the full history is used to fit each g-model. If FALSE, the state/Markov type history is used to fit each g-model. |
save_g_functions |
If TRUE, the fitted g-functions are saved. |
q_functions |
Fitted Q-model objects, see nuisance_functions.
Only valid if the Q-functions are fitted using the same policy.
Preferably, use |
q_models |
Outcome regression models/Q-models created by
|
q_full_history |
Similar to g_full_history. |
save_q_functions |
Similar to save_g_functions. |
target |
Character string. Either "value" or "subgroup". If "value",
the target parameter is the policy value.
If "subgroup", the target parameter
is the average treatement effect among
the subgroup of subjects that would receive
treatment under the policy, see details.
"subgroup" is only implemented for |
type |
Character string. Type of evaluation. Either |
cross_fit_type |
Character string.
Either "stacked", or "pooled", see details. (Only used if |
variance_type |
Character string. Either "pooled" (default),
"stacked" or "complete", see details. (Only used if |
M |
Number of folds for cross-fitting. |
future_args |
Arguments passed to |
name |
Character string. |
object , x , y |
Objects of class "policy_eval". |
... |
Additional arguments. |
digits |
Integer. Number of printed digits. |
width |
Integer. Width of printed parameter name. |
std.error |
Logical. Should the std.error be printed. |
level |
Numeric. Level of confidence limits. |
p.value |
Logical. Should the p.value for associated confidence level be printed. |
labels |
Name(s) of the estimate(s). |
paired |
|
Each observation has the sequential form
O= {B, U_1, X_1, A_1, ..., U_K, X_K, A_K, U_{K+1}},
for a possibly stochastic number of stages K.
B
is a vector of baseline covariates.
U_k
is the reward at stage k
(not influenced by the action A_k
).
X_k
is a vector of state
covariates summarizing the state at stage k.
A_k
is the categorical action
within the action set \mathcal{A}
at stage k.
The utility is given by the sum of the rewards, i.e.,
U = \sum_{k = 1}^{K+1} U_k
.
A policy is a set of functions
d = \{d_1, ..., d_K\},
where d_k
for k\in \{1, ..., K\}
maps \{B, X_1, A_1, ..., A_{k-1}, X_k\}
into the
action set.
Recursively define the Q-models (q_models
):
Q^d_K(h_K, a_K) = E[U|H_K = h_K, A_K = a_K]
Q^d_k(h_k, a_k) = E[Q_{k+1}(H_{k+1},
d_{k+1}(B,X_1, A_1,...,X_{k+1}))|H_k = h_k, A_k = a_k].
If q_full_history = TRUE
,
H_k = \{B, X_1, A_1, ..., A_{k-1}, X_k\}
, and if
q_full_history = FALSE
, H_k = \{B, X_k\}
.
The g-models (g_models
) are defined as
g_k(h_k, a_k) = P(A_k = a_k|H_k = h_k).
If g_full_history = TRUE
,
H_k = \{B, X_1, A_1, ..., A_{k-1}, X_k\}
, and if
g_full_history = FALSE
, H_k = \{B, X_k\}
.
Furthermore, if g_full_history = FALSE
and g_models
is a
single model, it is assumed that g_1(h_1, a_1) = ... = g_K(h_K, a_K)
.
If target = "value"
and type = "or"
policy_eval()
returns the empirical estimate of
the value (coef
):
E\left[Q^d_1(H_1, d_1(\cdot))\right]
If target = "value"
and type = "ipw"
policy_eval()
returns the empirical estimates of
the value (coef
) and influence curve (IC
):
E\left[\left(\prod_{k=1}^K I\{A_k = d_k(\cdot)\}
g_k(H_k, A_k)^{-1}\right) U\right].
\left(\prod_{k=1}^K I\{A_k =
d_k(\cdot)\} g_k(H_k, A_k)^{-1}\right) U -
E\left[\left(\prod_{k=1}^K
I\{A_k = d_k(\cdot)\} g_k(H_k, A_k)^{-1}\right) U\right].
If target = "value"
and
type = "dr"
policy_eval
returns the empirical estimates of
the value (coef
) and influence curve (IC
):
E[Z_1(d,g,Q^d)(O)],
Z_1(d, g, Q^d)(O) - E[Z_1(d,g, Q^d)(O)],
where
Z_1(d, g, Q^d)(O) = Q^d_1(H_1 , d_1(\cdot)) +
\sum_{r = 1}^K \prod_{j = 1}^{r}
\frac{I\{A_j = d_j(\cdot)\}}{g_{j}(H_j, A_j)}
\{Q_{r+1}^d(H_{r+1} , d_{r+1}(\cdot)) - Q_{r}^d(H_r , d_r(\cdot))\}.
If target = "subgroup"
, type = "dr"
, K = 1
,
and \mathcal{A} = \{0,1\}
, policy_eval()
returns the empirical estimates of the subgroup average
treatment effect (coef
) and influence curve (IC
):
E[Z_1(1,g,Q)(O) - Z_1(0,g,Q)(O) | d_1(\cdot) = 1],
\frac{1}{P(d_1(\cdot) = 1)} I\{d_1(\cdot) = 1\}
\Big\{Z_1(1,g,Q)(O) - Z_1(0,g,Q)(O) - E[Z_1(1,g,Q)(O)
- Z_1(0,g,Q)(O) | d_1(\cdot) = 1]\Big\}.
Applying M
-fold cross-fitting using the {M} argument, let
\mathcal{Z}_{1,m}(a) = \{Z_1(a, g_m, Q_m^d)(O): O\in \mathcal{O}_m \}.
If target = "subgroup"
, type = "dr"
, K = 1
,
\mathcal{A} = \{0,1\}
, and cross_fit_type = "pooled"
,
policy_eval()
returns the estimate
\frac{1}{{N^{-1} \sum_{i =
1}^N I\{d(H_i) = 1\}}} N^{-1} \sum_{m=1}^M \sum_{(Z, H) \in \mathcal{Z}_{1,m}
\times \mathcal{H}_{1,m}} I\{d_1(H) = 1\} \left\{Z(1)-Z(0)\right\}
If
cross_fit_type = "stacked"
the returned estimate is
M^{-1}
\sum_{m = 1}^M \frac{1}{{n^{-1} \sum_{h \in \mathcal{H}_{1,m}} I\{d(h) =
1\}}} n^{-1} \sum_{(Z, H) \in \mathcal{Z}_{1,m} \times \mathcal{H}_{1,m}}
I\{d_1(H) = 1\} \left\{Z(1)-Z(0)\right\},
where for ease of notation we let
the integer n
be the number of oberservations in each fold.
policy_eval()
returns an object of class "policy_eval".
The object is a list containing the following elements:
coef |
Numeric vector. The estimated target parameter: policy value or subgroup average treatment effect. |
IC |
Numeric matrix. Estimated influence curve associated with
|
type |
Character string. The type of evaluation ("dr", "ipw", "or"). |
target |
Character string. The target parameter ("value" or "subgroup") |
id |
Character vector. The IDs of the observations. |
name |
Character vector. Names for the each element in |
coef_ipw |
(only if |
coef_or |
(only if |
policy_actions |
data.table::data.table with keys id and stage. Actions associated with the policy for every observation and stage. |
policy_object |
(only if |
g_functions |
(only if |
g_values |
The fitted g-function values. |
q_functions |
(only if |
q_values |
The fitted Q-function values. |
Z |
(only if |
subgroup_indicator |
(only if |
cross_fits |
(only if |
folds |
(only if |
cross_fit_type |
Character string. |
variance_type |
Character string. |
The following S3 generic functions are available for an object of
class policy_eval
:
get_g_functions()
Extract the fitted g-functions.
get_q_functions()
Extract the fitted Q-functions.
get_policy()
Extract the fitted policy object.
get_policy_functions()
Extract the fitted policy function for a given stage.
get_policy_actions()
Extract the (fitted) policy actions.
plot.policy_eval()
Plot diagnostics.
van der Laan, Mark J., and Alexander R. Luedtke.
"Targeted learning of the mean outcome under an optimal dynamic treatment rule."
Journal of causal inference 3.1 (2015): 61-95.
doi:10.1515/jci-2013-0022
Tsiatis, Anastasios A., et al. Dynamic
treatment regimes: Statistical methods for precision medicine. Chapman and
Hall/CRC, 2019. doi:10.1201/9780429192692.
Victor Chernozhukov, Denis
Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey,
James Robins, Double/debiased machine learning for treatment and structural
parameters, The Econometrics Journal, Volume 21, Issue 1, 1 February 2018,
Pages C1–C68, doi:10.1111/ectj.12097.
lava::IC, lava::estimate.default.
library("polle")
### Single stage:
d1 <- sim_single_stage(5e2, seed=1)
pd1 <- policy_data(d1,
action = "A",
covariates = list("Z", "B", "L"),
utility = "U")
pd1
# defining a static policy (A=1):
pl1 <- policy_def(1)
# evaluating the policy:
pe1 <- policy_eval(policy_data = pd1,
policy = pl1,
g_models = g_glm(),
q_models = q_glm(),
name = "A=1 (glm)")
# summarizing the estimated value of the policy:
# (equivalent to summary(pe1)):
pe1
coef(pe1) # value coefficient
sqrt(vcov(pe1)) # value standard error
# getting the g-function and Q-function values:
head(predict(get_g_functions(pe1), pd1))
head(predict(get_q_functions(pe1), pd1))
# getting the fitted influence curve (IC) for the value:
head(IC(pe1))
# evaluating the policy using random forest nuisance models:
set.seed(1)
pe1_rf <- policy_eval(policy_data = pd1,
policy = pl1,
g_models = g_rf(),
q_models = q_rf(),
name = "A=1 (rf)")
# merging the two estimates (equivalent to pe1 + pe1_rf):
(est1 <- merge(pe1, pe1_rf))
coef(est1)
head(IC(est1))
### Two stages:
d2 <- sim_two_stage(5e2, seed=1)
pd2 <- policy_data(d2,
action = c("A_1", "A_2"),
covariates = list(L = c("L_1", "L_2"),
C = c("C_1", "C_2")),
utility = c("U_1", "U_2", "U_3"))
pd2
# defining a policy learner based on cross-fitted doubly robust Q-learning:
pl2 <- policy_learn(
type = "drql",
control = control_drql(qv_models = list(q_glm(~C_1),
q_glm(~C_1+C_2))),
full_history = TRUE,
L = 2) # number of folds for cross-fitting
# evaluating the policy learner using 2-fold cross fitting:
pe2 <- policy_eval(type = "dr",
policy_data = pd2,
policy_learn = pl2,
q_models = q_glm(),
g_models = g_glm(),
M = 2, # number of folds for cross-fitting
name = "drql")
# summarizing the estimated value of the policy:
pe2
# getting the cross-fitted policy actions:
head(get_policy_actions(pe2))