Compute nonparametric estimates of the chosen measure of predictiveness.
est_predictiveness_cv(
fitted_values,
y,
full_y = NULL,
folds,
type = "r_squared",
C = rep(1, length(y)),
Z = NULL,
folds_Z = folds,
ipc_weights = rep(1, length(C)),
ipc_fit_type = "external",
ipc_eif_preds = rep(1, length(C)),
ipc_est_type = "aipw",
scale = "identity",
na.rm = FALSE,
...
)
The estimated measure of predictiveness.
fitted values from a regression function using the
observed data; a list of length V, where each object is a set of
predictions on the validation data, or a vector of the same length as y
.
the observed outcome.
the observed outcome (from the entire dataset, for cross-fitted estimates).
the cross-validation folds for the observed data.
which parameter are you estimating (defaults to r_squared
,
for R-squared-based variable importance)?
the indicator of coarsening (1 denotes observed, 0 denotes unobserved).
either NULL
(if no coarsening) or a matrix-like object
containing the fully observed data.
either the cross-validation folds for the observed data (no coarsening) or a vector of folds for the fully observed data Z.
weights for inverse probability of coarsening (e.g., inverse weights from a two-phase sample) weighted estimation. Assumed to be already inverted (i.e., ipc_weights = 1 / [estimated probability weights]).
if "external", then use ipc_eif_preds
; if "SL",
fit a SuperLearner to determine the correction to the efficient
influence function.
if ipc_fit_type = "external"
, the fitted values
from a regression of the full-data EIF on the fully observed
covariates/outcome; otherwise, not used.
IPC correction, either "ipw"
(for classical
inverse probability weighting) or "aipw"
(for augmented inverse
probability weighting; the default).
if doing an IPC correction, then the scale that the correction should be computed on (e.g., "identity"; or "logit" to logit-transform, apply the correction, and back-transform).
logical; should NA's be removed in computation?
(defaults to FALSE
)
other arguments to SuperLearner, if ipc_fit_type = "SL"
.
See the paper by Williamson, Gilbert, Simon, and Carone for more details on the mathematics behind this function and the definition of the parameter of interest. If sample-splitting is also requested (recommended, since in this case inferences will be valid even if the variable has zero true importance), then the prediction functions are trained as if \(2K\)-fold cross-validation were run, but are evaluated on only \(K\) sets (independent between the full and reduced nuisance regression).