Learn R Programming

healthcareai (version 1.2.4)

imputeDF: Perform imputation on a dataframe

Description

This class performs imputation on a data frame. For numeric columns, the column-mean is used; for factor columns, the most frequent value is used.

Usage

imputeDF(df, imputeVals = NULL)

Arguments

df

A dataframe of values with NAs.

imputeVals

A list of values to be used for imputation. If an unnamed list must be the same length as as the number of columns in `df` and the order of values in `imputeVals` should match the order of columns in `df`. If named, names will be matched to names of `df`, and you can provide a subset of columns in `df` (see @details).

Value

A list. The first element, df, is the imputed dataframe. The second element, imputeVals, is a list of the imputation value used.

Details

If `imputeVals` is a named list containing a subset of the columns in `df`, columns that don't have a value provided will have one calculated. If you wish to provide custom imputation values for a subset of columns and leave NAs in other columns, supply the value NA to `imputeVals` for those columns. #'

References

http://healthcareai-r.readthedocs.io

See Also

healthcareai

Examples

Run this code
# NOT RUN {
# Impute a single column
df <- data.frame(a=c(1,2,3,NA), b=c('Y','N','Y',NA),
   c=c(11,21,31,43), d=c('Y','N','N',NA))
df <- df['a'] # note df[,1] does not return a df!
out <- imputeDF(df)
dfOut <- out$df # imputed data frame
imputeVals <- out$imputeVals # imputed values
print(dfOut)
# Impute an entire data frame
df <- data.frame(a=c(1,2,3,NA), b=c('Y','N','Y',NA),
   c=c(11,21,31,43), d=c('Y','N','N',NA))
out <- imputeDF(df)
dfOut <- out$df # imputed data frame
imputeVals <- out$imputeVals # imputed values
print(dfOut)

# To impute using your own values (one per column)
df <- data.frame(a=c(1,2,3,NA), b=c('Y','N','Y',NA),
   c=c(11,21,31,43), d=c('Y','N','N',NA))
myValues <- list(2, 'Y', 26.5, 'N') 
out <- imputeDF(df, myValues)
dfOut <- out$df # imputed data frame
print(dfOut)
# }

Run the code above in your browser using DataLab