Learn R Programming

yaImpute (version 1.0-21)

varSelection: Select variables for imputation models

Description

Computes grmsd (generalized root mean square distance) as variables are added to (method="addVars") or removed from (method="delVars") an k-NN imputation model. When adding variables the function keeps variables that strengthen imputation and deletes that weaken the imputation the least. The measure of model strength is distance between imputed and observed Y-variables.

Usage

varSelection(x,y,method="addVars",yaiMethod="msn",wts=NULL,nboot=20,trace=FALSE,
  useParallel=if (.Platform$OS.type == "windows") FALSE else TRUE,...)

Arguments

x
a set of X-Variables as used in yai.
y
a set of Y-Variables as used in yai.
method
if addVars, the X-Varialbes are added and if delVars they are deleted (see details).
yaiMethod
passed as method to yai.
wts
passed as argument wts to grmsd which is used to score the alternative varialbe sets.
nboot
the number of bootstrap samples used at each variable selection step (see Details). When nboot is zero, NO bootstraping is done.
trace
if TRUE information at each step is output.
useParallel
function link{parallel:mclapply} from parallel will be used if it is available for running the bootstraps. It it is not available, link{lapply} is used (which is the only option on windows).
...
passed to link{yai}

Value

  • An list of class varSel with these tags:
  • callthe call
  • grmsda 2-column matrix of the mean and std dev of the mean Mahalanobis distances associated with adding or removing the variables stored as the rownames. When nboot<2, the="" std="" dev="" are="" na<="" description="">
  • allgrmsda list of the grmsd values that correspond to each bootstrap replication. The data in grmsd are based on these vectors of information.
  • methodthe value of argument method.

Details

This function tracks the effect on generalized root mean square distance (see grmsd) when variables are added or deleted one at a time. When adding variables, the function starts with none, and keeps the single variable that provides the smallest grmsd. When deleting variables, the functions starts with all X-Variables and deletes them one at a time such that those that remain provide the smallest grmsd. The function uses the following steps:

  1. Functionyaiis run for all the Y-variables and candidate X-variable(s). The result is passed toimpute.yaito get imputed values of Y-variables. That result is passed togrmsdto compute a mean Mahalanobis distance for the case where the candidate variable is included (or deleted depending onmethod). However, these steps are done once for each bootstrap replication and the resulting values are averaged to provide an average mean Mahalanobis distance over the bootstraps.
  2. Step one is done for each candidate X-variable forming a vector ofgrmsdvalues, one corresponding to the case where each candidate is added or deleted.
  3. When variables are being added (method="addVars"), the variable that is related to the smallestgrmsdis kept. When variables are being deleted (method="delVars"), the variable that is related to the largestgrmsdis deleted.
  4. Once a variable has been added or deleted, the function proceeds to select another variable for selection or deletion by considering all remaining varialbes.

See Also

yai, impute.yai, bestVars and grmsd

Examples

Run this code
data(iris)

set.seed(12345)

x <- iris[,1:2]  # Sepal.Length Sepal.Width 
y <- iris[,3:4]  # Petal.Length Petal.Width 

vsel <- varSelection(x=x,y=y,nboot=5,useParallel=FALSE)
vsel

bestVars(vsel)

plot(vsel)

Run the code above in your browser using DataLab