policy_eval {polle}R Documentation

Policy Evaluation

Description

policy_eval() is used to estimate the value of a given fixed policy or a data adaptive policy (e.g. a policy learned from the data). policy_eval() is also used to estimate the average treatment effect among the subjects who would get the treatment under the policy.

Usage

policy_eval(
  policy_data,
  policy = NULL,
  policy_learn = NULL,
  g_functions = NULL,
  g_models = g_glm(),
  g_full_history = FALSE,
  save_g_functions = TRUE,
  q_functions = NULL,
  q_models = q_glm(),
  q_full_history = FALSE,
  save_q_functions = TRUE,
  target = "value",
  type = "dr",
  cross_fit_type = "pooled",
  variance_type = "pooled",
  M = 1,
  future_args = list(future.seed = TRUE),
  name = NULL
)

## S3 method for class 'policy_eval'
coef(object, ...)

## S3 method for class 'policy_eval'
IC(x, ...)

## S3 method for class 'policy_eval'
vcov(object, ...)

## S3 method for class 'policy_eval'
print(
  x,
  digits = 4L,
  width = 35L,
  std.error = TRUE,
  level = 0.95,
  p.value = TRUE,
  ...
)

## S3 method for class 'policy_eval'
summary(object, ...)

## S3 method for class 'policy_eval'
estimate(
  x,
  labels = get_element(x, "name", check_name = FALSE),
  level = 0.95,
  ...
)

## S3 method for class 'policy_eval'
merge(x, y, ..., paired = TRUE)

## S3 method for class 'policy_eval'
x + ...

Arguments

policy_data

Policy data object created by policy_data().

policy

Policy object created by policy_def().

policy_learn

Policy learner object created by policy_learn().

g_functions

Fitted g-model objects, see nuisance_functions. Preferably, use g_models.

g_models

List of action probability models/g-models for each stage created by g_empir(), g_glm(), g_rf(), g_sl() or similar functions. Only used for evaluation if g_functions is NULL. If a single model is provided and g_full_history is FALSE, a single g-model is fitted across all stages. If g_full_history is TRUE the model is reused at every stage.

g_full_history

If TRUE, the full history is used to fit each g-model. If FALSE, the state/Markov type history is used to fit each g-model.

save_g_functions

If TRUE, the fitted g-functions are saved.

q_functions

Fitted Q-model objects, see nuisance_functions. Only valid if the Q-functions are fitted using the same policy. Preferably, use q_models.

q_models

Outcome regression models/Q-models created by q_glm(), q_rf(), q_sl() or similar functions. Only used for evaluation if q_functions is NULL. If a single model is provided, the model is reused at every stage.

q_full_history

Similar to g_full_history.

save_q_functions

Similar to save_g_functions.

target

Character string. Either "value" or "subgroup". If "value", the target parameter is the policy value. If "subgroup", the target parameter is the average treatement effect among the subgroup of subjects that would receive treatment under the policy, see details. "subgroup" is only implemented for type = "dr" in the single-stage case with a dichotomous action set.

type

Character string. Type of evaluation. Either "dr" (doubly robust), "ipw" (inverse propensity weighting), or "or" (outcome regression).

cross_fit_type

Character string. Either "stacked", or "pooled", see details. (Only used if M > 1 and target = "subgroup")

variance_type

Character string. Either "pooled" (default), "stacked" or "complete", see details. (Only used if M > 1)

M

Number of folds for cross-fitting.

future_args

Arguments passed to future.apply::future_apply().

name

Character string.

object, x, y

Objects of class "policy_eval".

...

Additional arguments.

digits

Integer. Number of printed digits.

width

Integer. Width of printed parameter name.

std.error

Logical. Should the std.error be printed.

level

Numeric. Level of confidence limits.

p.value

Logical. Should the p.value for associated confidence level be printed.

labels

Name(s) of the estimate(s).

paired

TRUE indicates that the estimates are based on the same data sample.

Details

Each observation has the sequential form

O= {B, U_1, X_1, A_1, ..., U_K, X_K, A_K, U_{K+1}},

for a possibly stochastic number of stages K.

The utility is given by the sum of the rewards, i.e., U = \sum_{k = 1}^{K+1} U_k.

A policy is a set of functions

d = \{d_1, ..., d_K\},

where d_k for k\in \{1, ..., K\} maps \{B, X_1, A_1, ..., A_{k-1}, X_k\} into the action set.

Recursively define the Q-models (q_models):

Q^d_K(h_K, a_K) = E[U|H_K = h_K, A_K = a_K]

Q^d_k(h_k, a_k) = E[Q_{k+1}(H_{k+1}, d_{k+1}(B,X_1, A_1,...,X_{k+1}))|H_k = h_k, A_k = a_k].

If q_full_history = TRUE, H_k = \{B, X_1, A_1, ..., A_{k-1}, X_k\}, and if q_full_history = FALSE, H_k = \{B, X_k\}.

The g-models (g_models) are defined as

g_k(h_k, a_k) = P(A_k = a_k|H_k = h_k).

If g_full_history = TRUE, H_k = \{B, X_1, A_1, ..., A_{k-1}, X_k\}, and if g_full_history = FALSE, H_k = \{B, X_k\}. Furthermore, if g_full_history = FALSE and g_models is a single model, it is assumed that g_1(h_1, a_1) = ... = g_K(h_K, a_K).

If target = "value" and type = "or" policy_eval() returns the empirical estimate of the value (coef):

E\left[Q^d_1(H_1, d_1(\cdot))\right]

If target = "value" and type = "ipw" policy_eval() returns the empirical estimates of the value (coef) and influence curve (IC):

E\left[\left(\prod_{k=1}^K I\{A_k = d_k(\cdot)\} g_k(H_k, A_k)^{-1}\right) U\right].

\left(\prod_{k=1}^K I\{A_k = d_k(\cdot)\} g_k(H_k, A_k)^{-1}\right) U - E\left[\left(\prod_{k=1}^K I\{A_k = d_k(\cdot)\} g_k(H_k, A_k)^{-1}\right) U\right].

If target = "value" and type = "dr" policy_eval returns the empirical estimates of the value (coef) and influence curve (IC):

E[Z_1(d,g,Q^d)(O)],

Z_1(d, g, Q^d)(O) - E[Z_1(d,g, Q^d)(O)],

where

Z_1(d, g, Q^d)(O) = Q^d_1(H_1 , d_1(\cdot)) + \sum_{r = 1}^K \prod_{j = 1}^{r} \frac{I\{A_j = d_j(\cdot)\}}{g_{j}(H_j, A_j)} \{Q_{r+1}^d(H_{r+1} , d_{r+1}(\cdot)) - Q_{r}^d(H_r , d_r(\cdot))\}.

If target = "subgroup", type = "dr", K = 1, and \mathcal{A} = \{0,1\}, policy_eval() returns the empirical estimates of the subgroup average treatment effect (coef) and influence curve (IC):

E[Z_1(1,g,Q)(O) - Z_1(0,g,Q)(O) | d_1(\cdot) = 1],

\frac{1}{P(d_1(\cdot) = 1)} I\{d_1(\cdot) = 1\} \Big\{Z_1(1,g,Q)(O) - Z_1(0,g,Q)(O) - E[Z_1(1,g,Q)(O) - Z_1(0,g,Q)(O) | d_1(\cdot) = 1]\Big\}.

Applying M-fold cross-fitting using the {M} argument, let

\mathcal{Z}_{1,m}(a) = \{Z_1(a, g_m, Q_m^d)(O): O\in \mathcal{O}_m \}.

If target = "subgroup", type = "dr", K = 1, \mathcal{A} = \{0,1\}, and cross_fit_type = "pooled", policy_eval() returns the estimate

\frac{1}{{N^{-1} \sum_{i = 1}^N I\{d(H_i) = 1\}}} N^{-1} \sum_{m=1}^M \sum_{(Z, H) \in \mathcal{Z}_{1,m} \times \mathcal{H}_{1,m}} I\{d_1(H) = 1\} \left\{Z(1)-Z(0)\right\}

If cross_fit_type = "stacked" the returned estimate is

M^{-1} \sum_{m = 1}^M \frac{1}{{n^{-1} \sum_{h \in \mathcal{H}_{1,m}} I\{d(h) = 1\}}} n^{-1} \sum_{(Z, H) \in \mathcal{Z}_{1,m} \times \mathcal{H}_{1,m}} I\{d_1(H) = 1\} \left\{Z(1)-Z(0)\right\},

where for ease of notation we let the integer n be the number of oberservations in each fold.

Value

policy_eval() returns an object of class "policy_eval". The object is a list containing the following elements:

coef

Numeric vector. The estimated target parameter: policy value or subgroup average treatment effect.

IC

Numeric matrix. Estimated influence curve associated with coef.

type

Character string. The type of evaluation ("dr", "ipw", "or").

target

Character string. The target parameter ("value" or "subgroup")

id

Character vector. The IDs of the observations.

name

Character vector. Names for the each element in coef.

coef_ipw

(only if type = "dr") Numeric vector. Estimate of coef based solely on inverse probability weighting.

coef_or

(only if type = "dr") Numeric vector. Estimate of coef based solely on outcome regression.

policy_actions

data.table::data.table with keys id and stage. Actions associated with the policy for every observation and stage.

policy_object

(only if policy = NULL and M = 1) The policy object returned by policy_learn, see policy_learn.

g_functions

(only if M = 1) The fitted g-functions. Object of class "nuisance_functions".

g_values

The fitted g-function values.

q_functions

(only if M = 1) The fitted Q-functions. Object of class "nuisance_functions".

q_values

The fitted Q-function values.

Z

(only if target = "subgroup") Matrix with the doubly robust stage 1 scores for each action.

subgroup_indicator

(only if target = "subgroup") Logical matrix identifying subjects in the subgroup. Each column represents a different subgroup threshold.

cross_fits

(only if M > 1) List containing the "policy_eval" object for every (validation) fold.

folds

(only if M > 1) The (validation) folds used for cross-fitting.

cross_fit_type

Character string.

variance_type

Character string.

S3 generics

The following S3 generic functions are available for an object of class policy_eval:

get_g_functions()

Extract the fitted g-functions.

get_q_functions()

Extract the fitted Q-functions.

get_policy()

Extract the fitted policy object.

get_policy_functions()

Extract the fitted policy function for a given stage.

get_policy_actions()

Extract the (fitted) policy actions.

plot.policy_eval()

Plot diagnostics.

References

van der Laan, Mark J., and Alexander R. Luedtke. "Targeted learning of the mean outcome under an optimal dynamic treatment rule." Journal of causal inference 3.1 (2015): 61-95. doi:10.1515/jci-2013-0022

Tsiatis, Anastasios A., et al. Dynamic treatment regimes: Statistical methods for precision medicine. Chapman and Hall/CRC, 2019. doi:10.1201/9780429192692.

Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, James Robins, Double/debiased machine learning for treatment and structural parameters, The Econometrics Journal, Volume 21, Issue 1, 1 February 2018, Pages C1–C68, doi:10.1111/ectj.12097.

See Also

lava::IC, lava::estimate.default.

Examples

library("polle")
### Single stage:
d1 <- sim_single_stage(5e2, seed=1)
pd1 <- policy_data(d1,
                   action = "A",
                   covariates = list("Z", "B", "L"),
                   utility = "U")
pd1

# defining a static policy (A=1):
pl1 <- policy_def(1)

# evaluating the policy:
pe1 <- policy_eval(policy_data = pd1,
                   policy = pl1,
                   g_models = g_glm(),
                   q_models = q_glm(),
                   name = "A=1 (glm)")

# summarizing the estimated value of the policy:
# (equivalent to summary(pe1)):
pe1
coef(pe1) # value coefficient
sqrt(vcov(pe1)) # value standard error

# getting the g-function and Q-function values:
head(predict(get_g_functions(pe1), pd1))
head(predict(get_q_functions(pe1), pd1))

# getting the fitted influence curve (IC) for the value:
head(IC(pe1))

# evaluating the policy using random forest nuisance models:
set.seed(1)
pe1_rf <- policy_eval(policy_data = pd1,
                      policy = pl1,
                      g_models = g_rf(),
                      q_models = q_rf(),
                      name = "A=1 (rf)")

# merging the two estimates (equivalent to pe1 + pe1_rf):
(est1 <- merge(pe1, pe1_rf))
coef(est1)
head(IC(est1))

### Two stages:
d2 <- sim_two_stage(5e2, seed=1)
pd2 <- policy_data(d2,
                   action = c("A_1", "A_2"),
                   covariates = list(L = c("L_1", "L_2"),
                                     C = c("C_1", "C_2")),
                   utility = c("U_1", "U_2", "U_3"))
pd2

# defining a policy learner based on cross-fitted doubly robust Q-learning:
pl2 <- policy_learn(
   type = "drql",
   control = control_drql(qv_models = list(q_glm(~C_1),
                                           q_glm(~C_1+C_2))),
   full_history = TRUE,
   L = 2) # number of folds for cross-fitting

# evaluating the policy learner using 2-fold cross fitting:
pe2 <- policy_eval(type = "dr",
                   policy_data = pd2,
                   policy_learn = pl2,
                   q_models = q_glm(),
                   g_models = g_glm(),
                   M = 2, # number of folds for cross-fitting
                   name = "drql")
# summarizing the estimated value of the policy:
pe2

# getting the cross-fitted policy actions:
head(get_policy_actions(pe2))

[Package polle version 1.5 Index]