
Last chance! 50% off unlimited learning
Sale ends in
Set NA as the reference level for factor variables and do imputation on missing values for numeric variables. This is useful to build model matrices for regularized regression, and for dealing with missing values, as in Taddy 2019.
naref(x, impute=FALSE, pzero=0.5)
A data frame where the factor and character columns have been converted to factors with reference level NA
, and if impute=TRUE
the missing values in numeric columns have been imputed and a flag for missingness has been added. See details.
A data frame.
Logical, whether to impute missing values in numeric columns.
If impute==TRUE
, then if more than pzero
of the values in a column are zero do zero imputation, else do mean imputation.
Matt Taddy mataddy@gmail.com
For every factor
or character
column in x
, naref
sets NA
as the reference level for a factor
variable. Columns coded as character
class are first converted to factors via Rfactor(x). If impute=TRUE
then the numeric columns are converted to two columns, one appended .x
that contains imputed values and another appended .miss
which is a binary variable indicating whether the original value was missing. Numeric columns are returned without change if impute=FALSE
or if they do not contain any missing values.
Matt Taddy, 2019. "Business Data Science", McGraw-Hill
( x <- data.frame(a=factor(c(1,2,3)),b=c(1,NA,3)) )
naref(x, impute=TRUE)
Run the code above in your browser using DataLab