Original data frame or tibble (with missing values)
features
Correlation-based vector of ranked features output from running flatten_mat()
batch
Numeric. Batch size.
pmm_k
Integer. Number of neighbors considered in imputation. Default at 5.
n_trees
Integer. Number of trees used in imputation. Default at 15.
seed
Integer. Seed to be set for reproducibility.
save
Should the list of individual imputed batches be saved as .rds file to working directory? Default set to FALSE.
Details
Step 1. group data by dividing the row_number() by batch size (batch, number of batches set by user) using integer division. Step 2. pass through group_split() to return a list. Step 3. impute each batch individually and time. Step 4. generate completed (unlisted/joined) imputed data frame
References
Waggoner, P. D. (2023). A batch process for high dimensional imputation. Computational Statistics, 1-22. doi: <10.1007/s00180-023-01325-9>
Stekhoven, D. J., & Bühlmann, P. (2012). MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics, 28(1), 112-118. doi: <10.1093/bioinformatics/btr597>