The get_all_performance_boot function is designed to evaluate different methods of imputing missing values into a dataset. The evaluation is performed using bootstrapping to ensure robustness of the results
get_all_performance_boot(data, to_impute, regressors, nb = 1)
It returns a performance measures dataframe with rows = methods and columns = methods' performances averaged over bootstraps.
dataframe with rows = observations and columns = quantitative variables
string , name of the variables where there are NANs to impute
vector of string with names of the variables to use to apply 1st, 4th imputation method
number of bootstrap samples
The function calculates performance metrics, such as:
- \( R^2= [1/N * [({\sum_{i=1}^N(P_i - (\bar{P})(O_i - (\bar{O})]/\sigma_{P}*\sigma_{O}]^2}\),
- \(RMSE= (1/N * ({\sum_{i=1}^N(P_i - O_i)^2)^{1/2}}\)
and
- \(MAE = 1/N * {\sum_{i=1}^N|{P_i - O_i}|}\)
for each imputation method
Supported Imputation Methods:
1. Linear Regression Imputation (lm_imputation): it uses a linear regression model to predict and impute missing values
2. Median Imputation (median_imputation): it replaces missing values with the median of observed values
3. Mean Imputation (mean_imputation): it replaces missing values with the mean of observed values
4. Hot Deck Imputation (hot_deck_imputation): it replaces missing values with similar observed values
5. Expectation-Maximization Imputation (EM_imputation): it uses the Expectation-Maximization algorithm to estimate and impute missing values
Evaluate different methods of imputing missing values using bootstrapping and calculate performance metrics for each method
OECD/European Union/EC-JRC (2008), Handbook on Constructing Composite Indicators: Methodology and User Guide, OECD Publishing, Paris, <https://doi.org/10.1787/9789264043466-en>
data("airquality")
regressors<-colnames(airquality[,c(3,4)])
suppressWarnings(get_all_performance_boot(data =airquality,"Ozone",regressors = regressors,nb=100))
Run the code above in your browser using DataLab