Fills in missing values (NA) in numeric data using a specified imputation method. Provides a unified interface to univariate, multivariate, ensemble, and diagnostic imputation approaches. The function automatically handles method-specific parameters and error recovery.
impute_missings(
x,
method = "rf_missForest",
ImputationRepetitions = 10,
seed = NULL,
x_orig = NULL
)Returns a data frame with the same dimensions and column names as the input x,
but with missing values filled in according to the specified method. If imputation
fails, returns a data frame with all values set to NA.
Data frame or matrix containing numeric data with missing values (NA). All columns must be numeric.
Character string specifying which imputation method to use.
Default is "rf_missForest". See Details for all available methods.
Integer. Number of repetitions for methods ending
with "_repeated". These methods perform multiple imputations and
return the median across repetitions for increased stability. Default is 10.
Ignored for non-repeated methods.
Integer. Random seed for reproducibility. If missing, reads current system seed. Setting the parameter is recommended for better reproducibility. Must be the same as set in compare_imputation_methods for reprodicible results.
Data frame or matrix. Original complete data required only for
poisoned and calibrating methods (used for validation/benchmarking).
Must have same dimensions as x. Default is NULL.
Jorn Lotsch, Alfred Ultsch
This function provides access to multiple imputation algorithms through a single
interface. Simply specify the desired method name via the method parameter.
Available Methods:
Univariate methods (replace each missing value independently):
"median" - Column median
"mean" - Column mean
"mode" - Column mode (most frequent value)
"rSample" - Random sample from observed values
Bagging methods (bootstrap aggregating with decision trees):
"bag" - Single bagged tree imputation
"bag_repeated" - Repeated bagging with median aggregation
Random forest methods (ensemble of decision trees):
"rf_mice" - Random forest via mice package
"rf_mice_repeated" - Repeated RF via mice
"rf_missForest" - Random forest via missForest package (recommended)
"rf_missForest_repeated" - Repeated RF via missForest
"miceRanger" - Random forest via miceRanger package
"miceRanger_repeated" - Repeated RF via miceRanger
Tree-based methods:
"cart" - Classification and regression trees
"cart_repeated" - Repeated CART with median aggregation
Regression methods:
"linear" - Lasso regression (L1-regularized linear model)
"pmm" - Predictive mean matching
"pmm_repeated" - Repeated PMM with median aggregation
k-Nearest neighbors methods:
"knn3", "knn5", "knn7", "knn9", "knn10" -
k-NN with specified number of neighbors
Multiple imputation methods:
"ameliaImp" - Single imputation via Amelia II
"ameliaImp_repeated" - Multiple imputations via Amelia II
"miImp" - Multiple imputation via mi package
Poisoned methods (require x_orig, for validation only):
"plus" - Add systematic positive offset
"plusminus" - Add alternating positive/negative offset
"factor" - Multiply by constant factor
Calibrating methods (require x_orig, for benchmarking):
"tinyNoise_0.000001" through "tinyNoise_1" - Add small
random noise with specified magnitude (available magnitudes: 0.000001,
0.00001, 0.0001, 0.001, 0.01, 0.05, 0.1, 0.2, 0.5, 1)
Repeated methods:
Methods ending with "_repeated" perform multiple independent imputations
and return the median value across all repetitions. This typically provides
more stable and robust results but requires more computation time. The number
of repetitions is controlled by the ImputationRepetitions parameter.
Method selection guidance:
For quick results: Use "median" or "mean"
For moderate missing data: Use "rf_missForest" or "knn5"
For high-quality results: Use "rf_missForest_repeated" or "pmm_repeated"
For systematic comparison: Use compare_imputation_methods
Lotsch J, Ultsch A. (2025). A model-agnostic framework for dataset-specific selection of missing value imputation methods in pain-related numerical data. Can J Pain (in minor revision)
compare_imputation_methods
# Load example data
data_iris <- iris[,1:4]
# Add some misisngs
set.seed(42)
for(i in 1:4) data_iris[sample(1:nrow(data_iris), 0.05*nrow(data_iris)), i] <- NA
# Simple univariate imputation with median
data_iris_imputed_median <- impute_missings(
data_iris,
method = "median"
)
# Show data
head(data_iris_imputed_median)
Run the code above in your browser using DataLab