
Last chance! 50% off unlimited learning
Sale ends in
Impute missing values by a constant, by copying another variable computing transformations from other variables.
impute_proxy(dat, formula, add_residual = c("none", "observed", "normal"), ...)impute_const(dat, formula, add_residual = c("none", "observed", "normal"), ...)
[data.frame]
, with variables to be imputed and their
predictors.
[formula]
imputation model description (See Model description)
[character]
Type of residual to add. "normal"
means that the imputed value is drawn from N(mu,sd)
where mu
and sd
are estimated from the model's residuals (mu
should equal
zero in most cases). If add_residual = "observed"
, residuals are drawn
(with replacement) from the model's residuals. Ignored for non-numeric
predicted variables.
Currently unused
Formulas are of the form
IMPUTED_VARIABLES ~ MODEL_SPECIFICATION [ | GROUPING_VARIABLES ]
The left-hand-side of the formula object lists the variable or variables to be imputed.
For impute_const
, the MODEL_SPECIFICATION
is a single
value and GROUPING_VARIABLES
are ignored.
For impute_proxy
, the MODEL_SPECIFICATION
is a variable or
expression in terms of variables in the dataset that must result in either a
single number of in a vector of length nrow(dat)
.
If grouping variables are specified, the data set is split according to the values of those variables, and model estimation and imputation occur independently for each group.
Grouping using dplyr::group_by
is also supported. If groups are
defined in both the formula and using dplyr::group_by
, the data is
grouped by the union of grouping variables. Any missing value in one of the
grouping variables results in an error.
irisNA <- iris
irisNA[1:3,1] <- irisNA[3:7,2] <- NA
# impute a constant
a <- impute_const(irisNA, Sepal.Width ~ 7)
head(a)
a <- impute_proxy(irisNA, Sepal.Width ~ 7)
head(a)
# copy a value from another variable (where available)
a <- impute_proxy(irisNA, Sepal.Width ~ Sepal.Length)
head(a)
# group mean imputation
a <- impute_proxy(irisNA
, Sepal.Length ~ mean(Sepal.Length,na.rm=TRUE) | Species)
head(a)
# random hot deck imputation
a <- impute_proxy(irisNA, Sepal.Length ~ mean(Sepal.Length, na.rm=TRUE)
, add_residual = "observed")
# ratio imputation (but use impute_lm for that)
a <- impute_proxy(irisNA,
Sepal.Length ~ mean(Sepal.Length,na.rm=TRUE)/mean(Sepal.Width,na.rm=TRUE) * Sepal.Width)
Run the code above in your browser using DataLab