Vector with predicted values in the training data set.

xtrain

Vector with predicted values in the test data set.

xtest

Vector with observed response in the training data set.

ytrain

Number of nearest neighbours to choose from. Set <code>k = 0</code> if no predictive mean matching is to be done.

seed

This function is used internally only but might help others 
to implement an efficient way of doing predictive mean matching on top 
of any prediction based missing value imputation. It works as follows:
For each predicted value of a vector <code>xtest</code>, the closest <code>k</code> 
predicted values of another vector <code>xtrain</code> are identified by 
k-nearest neighbour. Then, one of those neighbours is randomly picked 
and its corresponding observed value in <code>ytrain</code> is returned.

Alternative implementation of the beautiful 'MissForest' algorithm used to impute mixed-type data sets by chaining random forests, introduced by Stekhoven, D.J. and Buehlmann, P. (2012) <doi:10.1093/bioinformatics/btr597>. Under the hood, it uses the lightning fast random jungle package 'ranger'. Between the iterative model fitting, we offer the option of using predictive mean matching. This firstly avoids imputation with values not already present in the original data (like a value 0.3334 in 0-1 coded variable). Secondly, predictive mean matching tries to raise the variance in the resulting conditional distributions to a realistic level. This would allow e.g. to do multiple imputation when repeating the call to missRanger().

pmm: Predictive Mean Matching

Description

Usage

Arguments

Value

Examples