est_predictiveness_cv: Estimate a nonparametric predictiveness functional using cross-fitting

Description

Compute nonparametric estimates of the chosen measure of predictiveness.

Usage

est_predictiveness_cv(
  fitted_values,
  y,
  full_y = NULL,
  folds,
  type = "r_squared",
  C = rep(1, length(y)),
  Z = NULL,
  folds_Z = folds,
  ipc_weights = rep(1, length(C)),
  ipc_fit_type = "external",
  ipc_eif_preds = rep(1, length(C)),
  ipc_est_type = "aipw",
  scale = "identity",
  na.rm = FALSE,
  ...
)

Value

The estimated measure of predictiveness.

Arguments

fitted_values: fitted values from a regression function using the observed data; a list of length V, where each object is a set of predictions on the validation data, or a vector of the same length as y.
y: the observed outcome.
full_y: the observed outcome (from the entire dataset, for cross-fitted estimates).
folds: the cross-validation folds for the observed data.
type: which parameter are you estimating (defaults to r_squared, for R-squared-based variable importance)?
C: the indicator of coarsening (1 denotes observed, 0 denotes unobserved).
Z: either NULL (if no coarsening) or a matrix-like object containing the fully observed data.
folds_Z: either the cross-validation folds for the observed data (no coarsening) or a vector of folds for the fully observed data Z.
ipc_weights: weights for inverse probability of coarsening (e.g., inverse weights from a two-phase sample) weighted estimation. Assumed to be already inverted (i.e., ipc_weights = 1 / [estimated probability weights]).
ipc_fit_type: if "external", then use ipc_eif_preds; if "SL", fit a SuperLearner to determine the correction to the efficient influence function.
ipc_eif_preds: if ipc_fit_type = "external", the fitted values from a regression of the full-data EIF on the fully observed covariates/outcome; otherwise, not used.
ipc_est_type: IPC correction, either "ipw" (for classical inverse probability weighting) or "aipw" (for augmented inverse probability weighting; the default).
scale: if doing an IPC correction, then the scale that the correction should be computed on (e.g., "identity"; or "logit" to logit-transform, apply the correction, and back-transform).
na.rm: logical; should NA's be removed in computation? (defaults to FALSE)
...: other arguments to SuperLearner, if ipc_fit_type = "SL".

Details

See the paper by Williamson, Gilbert, Simon, and Carone for more details on the mathematics behind this function and the definition of the parameter of interest. If sample-splitting is also requested (recommended, since in this case inferences will be valid even if the variable has zero true importance), then the prediction functions are trained as if \(2K\)-fold cross-validation were run, but are evaluated on only \(K\) sets (independent between the full and reduced nuisance regression).