policy_eval: Policy Evaluation

Description

policy_eval() is used to estimate the value of a given fixed policy or a data adaptive policy (e.g. a policy learned from the data). policy_eval() is also used to estimate the average treatment effect among the subjects who would get the treatment under the policy.

Usage

policy_eval(
  policy_data,
  policy = NULL,
  policy_learn = NULL,
  g_functions = NULL,
  g_models = g_glm(),
  g_full_history = FALSE,
  save_g_functions = TRUE,
  q_functions = NULL,
  q_models = q_glm(),
  q_full_history = FALSE,
  save_q_functions = TRUE,
  c_functions = NULL,
  c_models = NULL,
  c_full_history = FALSE,
  save_c_functions = TRUE,
  m_function = NULL,
  m_model = NULL,
  m_full_history = FALSE,
  save_m_function = TRUE,
  target = "value",
  type = "dr",
  cross_fit_type = "pooled",
  variance_type = "pooled",
  M = 1,
  nrep = 1,
  min_subgroup_size = 1,
  future_args = list(future.seed = TRUE),
  name = NULL
)
# S3 method for policy_eval
coef(object, ...)
# S3 method for policy_eval
IC(x, ...)
# S3 method for policy_eval
vcov(object, ...)
# S3 method for policy_eval
print(
  x,
  digits = 4L,
  width = 35L,
  std.error = TRUE,
  level = 0.95,
  p.value = TRUE,
  ...
)
# S3 method for policy_eval
summary(object, ...)
# S3 method for policy_eval
estimate(
  x,
  labels = get_element(x, "name", check_name = FALSE),
  level = 0.95,
  ...
)
# S3 method for policy_eval
merge(x, y, ..., paired = TRUE)
# S3 method for policy_eval
+(x, ...)
# S3 method for policy_eval_online
vcov(object, ...)

Value

policy_eval() returns an object of class "policy_eval". The object is a list containing the following elements:

coef: Numeric vector. The estimated target parameter: policy value or subgroup average treatment effect.
IC: Numeric matrix. Estimated influence curve associated with coef.
type: Character string. The type of evaluation ("dr", "ipw", "or").
target: Character string. The target parameter ("value" or "subgroup")
id: Character vector. The IDs of the observations.
name: Character vector. Names for the each element in coef.
coef_ipw: (only if type = "dr") Numeric vector. Estimate of coef based solely on inverse probability weighting.
coef_or: (only if type = "dr") Numeric vector. Estimate of coef based solely on outcome regression.
policy_actions: data.table::data.table with keys id and stage. Actions associated with the policy for every observation and stage.
policy_object: (only if policy = NULL and M = 1) The policy object returned by policy_learn, see policy_learn.
g_functions: (only if M = 1) The fitted g-functions. Object of class "nuisance_functions".
g_values: The fitted g-function values.
q_functions: (only if M = 1) The fitted Q-functions. Object of class "nuisance_functions".
q_values: The fitted Q-function values.
Z: (only if target = "subgroup") Matrix with the doubly robust stage 1 scores for each action.
subgroup_indicator: (only if target = "subgroup") Logical matrix identifying subjects in the subgroup. Each column represents a different subgroup threshold.
cross_fits: (only if M > 1) List containing the "policy_eval" object for every (validation) fold.
folds: (only if M > 1) The (validation) folds used for cross-fitting.
cross_fit_type: Character string.
variance_type: Character string.

Arguments

policy_data: Policy data object created by policy_data().
policy: Policy object created by policy_def().
policy_learn: Policy learner object created by policy_learn().
g_functions: Fitted g-model objects, see nuisance_functions. Preferably, use g_models.
g_models: List of action probability models/g-models for each stage created by g_empir(), g_glm(), g_rf(), g_sl() or similar functions. Only used for evaluation if g_functions is NULL. If a single model is provided and g_full_history is FALSE, a single g-model is fitted across all stages. If g_full_history is TRUE the model is reused at every stage.
g_full_history: If TRUE, the full history is used to fit each g-model. If FALSE, the state/Markov type history is used to fit each g-model.
save_g_functions: If TRUE, the fitted g-functions are saved.
q_functions: Fitted Q-model objects, see nuisance_functions. Only valid if the Q-functions are fitted using the same policy. Preferably, use q_models.
q_models: Outcome regression models/Q-models created by q_glm(), q_rf(), q_sl() or similar functions. Only used for evaluation if q_functions is NULL. If a single model is provided, the model is reused at every stage.
q_full_history: Similar to g_full_history.
save_q_functions: Similar to save_g_functions.
c_functions: Fitted c-model/censoring probability model objects. Preferably, use c_models.
c_models: List of right-censoring probability models, see c_model.
c_full_history: Similar to g_full_history.
save_c_functions: Similar to save_g_functions.
m_function: Fitted outcome model object for stage K+1. Preferably, use m_model.
m_model: Outcome model for the utility at stage K+1. Only used if the final utility contribution is missing/has been right-censored
m_full_history: Similar to g_full_history.
save_m_function: Similar to save_g_functions.
target: Character string. Either "value" or "subgroup". If "value", the target parameter is the policy value. If "subgroup", the target parameter is the average treatement effect among the subgroup of subjects that would receive treatment under the policy, see details. "subgroup" is only implemented for type = "dr" in the single-stage case with a dichotomous action set.
type: Character string. Type of evaluation. Either "dr" (doubly robust), "ipw" (inverse propensity weighting), or "or" (outcome regression).
cross_fit_type: Character string. Either "stacked", or "pooled", see details. (Only used if M > 1 and target = "subgroup")
variance_type: Character string. Either "pooled" (default), "stacked" or "complete", see details. (Only used if M > 1)
M: Number of folds for cross-fitting.
nrep: Number of repetitions of cross-fitting (estimates averaged over repeated cross-fittings)
min_subgroup_size: Minimum number of observations in the evaluated subgroup (Only used if target = "subgroup").
future_args: Arguments passed to future.apply::future_apply().
name: Character string.
object, x, y: Objects of class "policy_eval".
...: Additional arguments.
digits: Integer. Number of printed digits.
width: Integer. Width of printed parameter name.
std.error: Logical. Should the std.error be printed.
level: Numeric. Level of confidence limits.
p.value: Logical. Should the p.value for associated confidence level be printed.
labels: Name(s) of the estimate(s).
paired: TRUE indicates that the estimates are based on the same data sample.

S3 generics

The following S3 generic functions are available for an object of class policy_eval:

get_g_functions(): Extract the fitted g-functions.
get_q_functions(): Extract the fitted Q-functions.
get_policy(): Extract the fitted policy object.
get_policy_functions(): Extract the fitted policy function for a given stage.
get_policy_actions(): Extract the (fitted) policy actions.
plot.policy_eval(): Plot diagnostics.

Details

Each observation has the sequential form $$O= {B, U_1, X_1, A_1, ..., U_K, X_K, A_K, U_{K+1}},$$ for a possibly stochastic number of stages K.

$B$ is a vector of baseline covariates.
$U_k$ is the reward at stage k (not influenced by the action $A_k$).
$X_k$ is a vector of state covariates summarizing the state at stage k.
$A_k$ is the categorical action within the action set $\mathcal{A}$ at stage k.

The utility is given by the sum of the rewards, i.e., $U = \sum_{k = 1}^{K+1} U_k$.

A policy is a set of functions $$d = \{d_1, ..., d_K\},$$ where $d_k$ for $k\in \{1, ..., K\}$ maps $\{B, X_1, A_1, ..., A_{k-1}, X_k\}$ into the action set.

Recursively define the Q-models (q_models): $$Q^d_K(h_K, a_K) = E[U|H_K = h_K, A_K = a_K]$$ $$Q^d_k(h_k, a_k) = E[Q_{k+1}(H_{k+1}, d_{k+1}(B,X_1, A_1,...,X_{k+1}))|H_k = h_k, A_k = a_k].$$ If q_full_history = TRUE, $H_k = \{B, X_1, A_1, ..., A_{k-1}, X_k\}$, and if q_full_history = FALSE, $H_k = \{B, X_k\}$.

The g-models (g_models) are defined as $$g_k(h_k, a_k) = P(A_k = a_k|H_k = h_k).$$ If g_full_history = TRUE, $H_k = \{B, X_1, A_1, ..., A_{k-1}, X_k\}$, and if g_full_history = FALSE, $H_k = \{B, X_k\}$. Furthermore, if g_full_history = FALSE and g_models is a single model, it is assumed that $g_1(h_1, a_1) = ... = g_K(h_K, a_K)$.

If target = "value" and type = "or" policy_eval() returns the empirical estimate of the value (coef): $$E\left[Q^d_1(H_1, d_1(\cdot))\right]$$

If target = "value" and type = "ipw" policy_eval() returns the empirical estimates of the value (coef) and influence curve (IC): $$E\left[\left(\prod_{k=1}^K I\{A_k = d_k(\cdot)\} g_k(H_k, A_k)^{-1}\right) U\right].$$ $$\left(\prod_{k=1}^K I\{A_k = d_k(\cdot)\} g_k(H_k, A_k)^{-1}\right) U - E\left[\left(\prod_{k=1}^K I\{A_k = d_k(\cdot)\} g_k(H_k, A_k)^{-1}\right) U\right].$$

If target = "value" and type = "dr" policy_eval returns the empirical estimates of the value (coef) and influence curve (IC): $$E[Z_1(d,g,Q^d)(O)],$$ $$Z_1(d, g, Q^d)(O) - E[Z_1(d,g, Q^d)(O)],$$ where $$ Z_1(d, g, Q^d)(O) = Q^d_1(H_1 , d_1(\cdot)) + \sum_{r = 1}^K \prod_{j = 1}^{r} \frac{I\{A_j = d_j(\cdot)\}}{g_{j}(H_j, A_j)} \{Q_{r+1}^d(H_{r+1} , d_{r+1}(\cdot)) - Q_{r}^d(H_r , d_r(\cdot))\}. $$

If target = "subgroup", type = "dr", K = 1, and $\mathcal{A} = \{0,1\}$, policy_eval() returns the empirical estimates of the subgroup average treatment effect (coef) and influence curve (IC): $$E[Z_1(1,g,Q)(O) - Z_1(0,g,Q)(O) | d_1(\cdot) = 1],$$ $$\frac{1}{P(d_1(\cdot) = 1)} I\{d_1(\cdot) = 1\} \Big\{Z_1(1,g,Q)(O) - Z_1(0,g,Q)(O) - E[Z_1(1,g,Q)(O) - Z_1(0,g,Q)(O) | d_1(\cdot) = 1]\Big\}.$$

Applying $M$-fold cross-fitting using the {M} argument, let $$\mathcal{Z}_{1,m}(a) = \{Z_1(a, g_m, Q_m^d)(O): O\in \mathcal{O}_m \}.$$

If target = "subgroup", type = "dr", K = 1, $\mathcal{A} = \{0,1\}$, and cross_fit_type = "pooled", policy_eval() returns the estimate $$\frac{1}{{N^{-1} \sum_{i = 1}^N I\{d(H_i) = 1\}}} N^{-1} \sum_{m=1}^M \sum_{(Z, H) \in \mathcal{Z}_{1,m} \times \mathcal{H}_{1,m}} I\{d_1(H) = 1\} \left\{Z(1)-Z(0)\right\}$$ If cross_fit_type = "stacked" the returned estimate is $$M^{-1} \sum_{m = 1}^M \frac{1}{{n^{-1} \sum_{h \in \mathcal{H}_{1,m}} I\{d(h) = 1\}}} n^{-1} \sum_{(Z, H) \in \mathcal{Z}_{1,m} \times \mathcal{H}_{1,m}} I\{d_1(H) = 1\} \left\{Z(1)-Z(0)\right\},$$ where for ease of notation we let the integer $n$ be the number of oberservations in each fold.

References

van der Laan, Mark J., and Alexander R. Luedtke. "Targeted learning of the mean outcome under an optimal dynamic treatment rule." Journal of causal inference 3.1 (2015): 61-95. tools:::Rd_expr_doi("10.1515/jci-2013-0022")

Tsiatis, Anastasios A., et al. Dynamic treatment regimes: Statistical methods for precision medicine. Chapman and Hall/CRC, 2019. tools:::Rd_expr_doi("10.1201/9780429192692").

Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, James Robins, Double/debiased machine learning for treatment and structural parameters, The Econometrics Journal, Volume 21, Issue 1, 1 February 2018, Pages C1–C68, tools:::Rd_expr_doi("10.1111/ectj.12097").

Examples

Run this code

library("polle")
### Single stage:
d1 <- sim_single_stage(5e2, seed=1)
pd1 <- policy_data(d1,
                   action = "A",
                   covariates = list("Z", "B", "L"),
                   utility = "U")
pd1

# defining a static policy (A=1):
pl1 <- policy_def(1)

# evaluating the policy:
pe1 <- policy_eval(policy_data = pd1,
                   policy = pl1,
                   g_models = g_glm(),
                   q_models = q_glm(),
                   name = "A=1 (glm)")

# summarizing the estimated value of the policy:
# (equivalent to summary(pe1)):
pe1
coef(pe1) # value coefficient
sqrt(vcov(pe1)) # value standard error

# getting the g-function and Q-function values:
head(predict(get_g_functions(pe1), pd1))
head(predict(get_q_functions(pe1), pd1))

# getting the fitted influence curve (IC) for the value:
head(IC(pe1))

# evaluating the policy using random forest nuisance models:
set.seed(1)
pe1_rf <- policy_eval(policy_data = pd1,
                      policy = pl1,
                      g_models = g_rf(),
                      q_models = q_rf(),
                      name = "A=1 (rf)")

# merging the two estimates (equivalent to pe1 + pe1_rf):
(est1 <- merge(pe1, pe1_rf))
coef(est1)
head(IC(est1))

### Two stages:
d2 <- sim_two_stage(5e2, seed=1)
pd2 <- policy_data(d2,
                   action = c("A_1", "A_2"),
                   covariates = list(L = c("L_1", "L_2"),
                                     C = c("C_1", "C_2")),
                   utility = c("U_1", "U_2", "U_3"))
pd2

# defining a policy learner based on cross-fitted doubly robust Q-learning:
pl2 <- policy_learn(
   type = "drql",
   control = control_drql(qv_models = list(q_glm(~C_1),
                                           q_glm(~C_1+C_2))),
   full_history = TRUE,
   L = 2) # number of folds for cross-fitting

# evaluating the policy learner using 2-fold cross fitting:
pe2 <- policy_eval(type = "dr",
                   policy_data = pd2,
                   policy_learn = pl2,
                   q_models = q_glm(),
                   g_models = g_glm(),
                   M = 2, # number of folds for cross-fitting
                   name = "drql")
# summarizing the estimated value of the policy:
pe2

# getting the cross-fitted policy actions:
head(get_policy_actions(pe2))

Run the code above in your browser using DataLab