These are objects that can be used for modeling, especially in conjunction with the textrecipes package.
weightweight_scheme
token
max_times
min_times
max_tokens
Each object is generated by either new_quant_param or
new_qual_param.
An object of class quant_param (inherits from param) of length 7.
These objects are pre-made parameter sets that are useful in a variety of models.
min_times, max_times: frequency of word occurances for removal.
See ?step_tokenfilter.
max_tokens: the number of tokens that will be retained. See
?step_tokenfilter.
weight: A parameter for "double normalization" when creating token
counts. See ?step_tf.
weight_scheme: the method for term frequency calculations. Possible
values are: "binary", "raw count", "term frequency", "log normalization",
or "double normalization". See ?step_tf.
token: the type of token with possible values: "characters",
"character_shingle", "lines", "ngrams", "paragraphs", "ptb", "regex",
"sentences", "skip_ngrams", "tweets", "words", "word_stems". See
?step_tokenize