These are objects that can be used for modeling, especially in conjunction with the textrecipes package.
weightweight_scheme
token
max_times
min_times
max_tokens
Each object is generated by either new_quant_param
or
new_qual_param
.
An object of class quant_param
(inherits from param
) of length 7.
These objects are pre-made parameter sets that are useful in a variety of models.
min_times
, max_times
: frequency of word occurances for removal.
See ?step_tokenfilter
.
max_tokens
: the number of tokens that will be retained. See
?step_tokenfilter
.
weight
: A parameter for "double normalization" when creating token
counts. See ?step_tf
.
weight_scheme
: the method for term frequency calculations. Possible
values are: "binary", "raw count", "term frequency", "log normalization",
or "double normalization". See ?step_tf
.
token
: the type of token with possible values: "characters",
"character_shingle", "lines", "ngrams", "paragraphs", "ptb", "regex",
"sentences", "skip_ngrams", "tweets", "words", "word_stems". See
?step_tokenize