Learn R Programming

bioLeak (version 0.2.0)

impute_guarded: Leakage-safe data imputation via guarded preprocessing

Description

Fits imputation parameters on the training data only, then applies the same guarded transformation to the test data. This function is a thin wrapper around the guarded preprocessing used by fit_resample(). Output is the transformed feature matrix used by the guarded pipeline (categorical variables are one-hot encoded).

Usage

impute_guarded(
  train,
  test,
  method = c("median", "knn", "missForest", "none"),
  constant_value = 0,
  k = 5,
  seed = 123,
  winsor = TRUE,
  winsor_thresh = 3,
  parallel = FALSE,
  return_outliers = FALSE,
  vars = NULL
)

Value

A LeakImpute object with imputed data and guard state.

Arguments

train

data frame (training set)

test

data frame (test set)

method

one of "median", "knn", "missForest", or "none"

constant_value

unused; retained for backward compatibility

k

number of neighbors for kNN imputation (if method = "knn")

seed

unused; retained for backward compatibility. Set seed before calling this function if reproducibility is needed.

winsor

logical; apply MAD-based winsorization before imputation

winsor_thresh

numeric; MAD cutoff (default = 3)

parallel

logical; unused (kept for compatibility)

return_outliers

logical; unused (outlier flags not returned)

vars

optional character vector; impute only selected variables

See Also

[fit_resample()], [predict_guard()]

Examples

Run this code
train <- data.frame(x = c(1, 2, NA, 4), y = c(NA, 1, 1, 0))
test <- data.frame(x = c(NA, 5), y = c(1, NA))
imp <- impute_guarded(train, test, method = "median", winsor = FALSE)
imp$train
imp$test

Run the code above in your browser using DataLab