categories
stores all the categorical values that are present in the factors and character vectors of a data frame. Numeric and integer vectors are ignored. It is a preprocessing step for the dummy
function. This function is appropriate for settings in which the user only wants to compute dummies for the categorical values that were present in another data set. This is especially useful in predictive modeling, when the new (test) data has more or other categories than the training data.categories(x, p = "all")
dummy
#create toy data
(traindata <- data.frame(xvar=as.factor(c("a","b","b","c")),
yvar=as.factor(c(1,1,2,3)),
var3=c("val1","val2","val3","val3"),
stringsAsFactors=FALSE))
(newdata <- data.frame(xvar=as.factor(c("a","b","b","c","d","d")),
yvar=as.factor(c(1,1,2,3,4,5)),
var3=c("val1","val2","val3","val3","val4","val4"),
stringsAsFactors=FALSE))
categories(x=traindata,p="all")
categories(x=traindata,p=2)
categories(x=traindata,p=c(2,1,3))
Run the code above in your browser using DataLab