impute_proxy: Impute by variable derivation

Description

Impute missing values by a constant, by copying another variable computing transformations from other variables.

Usage

impute_proxy(dat, formula, add_residual = c("none", "observed", "normal"), ...)
impute_const(dat, formula, add_residual = c("none", "observed", "normal"), ...)

Arguments

dat: [data.frame], with variables to be imputed and their predictors.
formula: [formula] imputation model description (See Model description)
add_residual: [character] Type of residual to add. "normal" means that the imputed value is drawn from N(mu,sd) where mu and sd are estimated from the model's residuals (mu should equal zero in most cases). If add_residual = "observed", residuals are drawn (with replacement) from the model's residuals. Ignored for non-numeric predicted variables.
...: Currently unused

Model Specification

Formulas are of the form

IMPUTED_VARIABLES ~ MODEL_SPECIFICATION [ | GROUPING_VARIABLES ]

The left-hand-side of the formula object lists the variable or variables to be imputed.

For impute_const, the MODEL_SPECIFICATION is a single value and GROUPING_VARIABLES are ignored.

For impute_proxy, the MODEL_SPECIFICATION is a variable or expression in terms of variables in the dataset that must result in either a single number of in a vector of length nrow(dat).

If grouping variables are specified, the data set is split according to the values of those variables, and model estimation and imputation occur independently for each group.

Grouping using dplyr::group_by is also supported. If groups are defined in both the formula and using dplyr::group_by, the data is grouped by the union of grouping variables. Any missing value in one of the grouping variables results in an error.

Examples

Run this code

irisNA <- iris
irisNA[1:3,1] <- irisNA[3:7,2] <- NA

# impute a constant 

a <- impute_const(irisNA, Sepal.Width ~ 7)
head(a)

a <- impute_proxy(irisNA, Sepal.Width ~ 7)
head(a)

# copy a value from another variable (where available)
a <- impute_proxy(irisNA, Sepal.Width ~ Sepal.Length)
head(a)

# group mean imputation
a <- impute_proxy(irisNA
  , Sepal.Length ~ mean(Sepal.Length,na.rm=TRUE) | Species)
head(a)

# random hot deck imputation
a <- impute_proxy(irisNA, Sepal.Length ~ mean(Sepal.Length, na.rm=TRUE)
, add_residual = "observed")

# ratio imputation (but use impute_lm for that)
a <- impute_proxy(irisNA, 
  Sepal.Length ~ mean(Sepal.Length,na.rm=TRUE)/mean(Sepal.Width,na.rm=TRUE) * Sepal.Width)

Run the code above in your browser using DataLab