Often one has an R factor in which one or more levels are rare in the
data. This could cause problems, say in performing cross-validation; a
level in the test set might be "new," not having appeared in the
training set. Toward this end, factorToTopLevels
will remove
rare levels from a factor; dataToTopLevels
applies this to an
entire data frame.
Also toward this end, the function levelCounts
simply applies
table()
to each column of data
, returning the result as an
R list. (If more than 10 levels, it returns NA.
The function cartesianFactor
generates a "superfactor" from
individual ones; e.g. if factors f1 and f2 have n1 and n2 levels, the
output is a new factor with n1 * n2 levels.
The function qeRareLevels
checks all columns in a data frame in
terms of being an R factor with rare levels.