dcee: Distal Causal Excursion Effect (DCEE) Estimation

Description

Fits distal causal excursion effects in micro-randomized trials using a **two-stage** estimator: (i) learn nuisance outcome regressions $\mu_a(H_t)$ with a specified learner (parametric/ML), optionally with cross-fitting; (ii) solve estimating equations for the distal excursion effect parameters ($\beta$).

This wrapper standardizes inputs and delegates computation to [dcee_helper_2stage_estimation()].

Usage

dcee(
  data,
  id,
  outcome,
  treatment,
  rand_prob,
  moderator_formula,
  control_formula,
  availability = NULL,
  control_reg_method = c("gam", "lm", "rf", "ranger", "sl", "sl.user-specified-library",
    "set_to_zero"),
  cross_fit = FALSE,
  cf_fold = 10,
  weighting_function = NULL,
  verbose = TRUE,
  ...
)

Value

An object of class `"dcee_fit"` with components:

call

The matched call to dcee().

fit

A list returned by the two–stage helper with elements:

beta_hat: Named numeric vector of distal causal excursion effect estimates $\beta$. Names are "Intercept" and the moderator names (if any) from moderator_formula.

beta_se

Named numeric vector of standard errors for beta_hat (same order/names).

beta_varcov

Variance–covariance matrix of beta_hat (square matrix; row/column names match names(beta_hat)).

conf_int

Matrix of large-sample (normal) Wald 95% confidence intervals for beta_hat; columns are "2.5 %" and "97.5 %".

conf_int_tquantile

Matrix of small-sample (t-quantile) 95% confidence intervals for beta_hat; columns are "2.5 %" and "97.5 %"; degrees of freedom are provided in $df of the "dcee_fit" object.

regfit_a0

Stage-1 nuisance regression fit for $\mu_0(H_t)$ (outcome model among A=0), or NULL when control_reg_method = "set_to_zero". Note: when cross_fit = TRUE, this is the learner object from the last fold and is provided for inspection only (do not use for out-of-fold prediction).

regfit_a1

Stage-1 nuisance regression fit for $\mu_1(H_t)$ (outcome model among A=1); same caveats as regfit_a0 regarding cross_fit.

Small-sample degrees of freedom used for t-based intervals: number of unique subjects minus length(fit$beta_hat).

Arguments

data: A data.frame in long format.
id: Character scalar: column name for subject identifier.
outcome: Character scalar: column name for proximal/distal outcome.
treatment: Character scalar: column name for binary treatment {0,1}.
rand_prob: Character scalar: column name for randomization probability giving $P(A_t=1\mid H_t)$ (must lie in (0,1)).
moderator_formula: RHS-only formula of moderators of the excursion effect (e.g., `~ 1`, `~ Z`, or `~ Z1 + Z2`).
control_formula: RHS-only formula of covariates for learning nuisance outcome regressions. When `control_reg_method = "gam"`, `s(x)` terms are allowed (e.g., `~ x1 + s(x2)`). For SuperLearner methods, variables are extracted from this formula to build the design matrix `X`.
availability: Optional character scalar: column name for availability indicator (0/1). If `NULL`, availability is taken as 1 for all rows.
control_reg_method: One of `"gam"`, `"lm"`, `"rf"`, `"ranger"`, `"sl"`, `"sl.user-specified-library"`, `"set_to_zero"`. See Details.
cross_fit: Logical; if `TRUE`, perform K-fold cross-fitting by subject id.
cf_fold: Integer; number of folds if `cross_fit = TRUE` (default 10).
weighting_function: Either a single numeric constant applied to all rows, or a character column name in `data` giving decision-point weights $\omega_t$.
verbose: Logical; print minimal preprocessing messages (default `TRUE`).
...: Additional arguments passed through to the chosen learner (e.g., `num.trees`, `mtry` for random forests; `sl.library` when `control_reg_method = "sl.user-specified-library"`).

Details

**Learners.** - `gam` uses mgcv and supports `s(.)` terms in `control_formula`. - `lm` uses base stats::lm. - `rf` uses randomForest; `ranger` uses ranger. - `sl` / `sl.user-specified-library` use SuperLearner. For the former, `sl.library = c("SL.mean", "SL.glm", "SL.earth")` are used. For the latter, please provide `sl.library = c("SL.mean", ...)` via `...`.

**Notes.** - Treatment must be coded 0/1; `rand_prob` must lie strictly in (0,1). - `control_formula = ~ 1` is only valid with `control_reg_method = "set_to_zero"`.

References

Qian, T. (2025). Distal causal excursion effects: modeling long-term effects of time-varying treatments in micro-randomized trials. *Biometrics*, 81(4), ujaf134.

Examples

Run this code

data(data_distal_continuous, package = "MRTAnalysis")

## Fast example: marginal effect with linear nuisance (CRAN-friendly)
fit_lm <- dcee(
    data = data_distal_continuous,
    id = "userid", outcome = "Y", treatment = "A", rand_prob = "prob_A",
    moderator_formula = ~1, # marginal (no moderators)
    control_formula = ~X, # simple linear nuisance
    availability = "avail",
    control_reg_method = "lm",
    cross_fit = FALSE
)
summary(fit_lm)
summary(fit_lm, show_control_fit = TRUE) # show Stage-1 fit info

## Moderated effect with GAM nuisance (allows smooth terms); may be slower
# \donttest{
fit_gam <- dcee(
    data = data_distal_continuous,
    id = "userid", outcome = "Y", treatment = "A", rand_prob = "prob_A",
    moderator_formula = ~Z, # test moderation by Z
    control_formula = ~ s(X) + Z, # smooth in nuisance via mgcv::gam
    availability = "avail",
    control_reg_method = "gam",
    cross_fit = TRUE, cf_fold = 5
)
summary(fit_gam, lincomb = c(0, 1)) # linear combo: the Z coefficient
summary(fit_gam, show_control_fit = TRUE) # show Stage-1 fit info
# }

## Optional: SuperLearner (runs only if installed)
# \donttest{
if (requireNamespace("SuperLearner", quietly = TRUE)) {
  library(SuperLearner)
  fit_sl <- dcee(
      data = data_distal_continuous,
      id = "userid", outcome = "Y", treatment = "A", rand_prob = "prob_A",
      moderator_formula = ~1,
      control_formula = ~ X + Z,
      availability = "avail",
      control_reg_method = "sl",
      cross_fit = FALSE
  )
  summary(fit_sl)
}
# }

Run the code above in your browser using DataLab