dials (version 0.0.2)

weight: Parameter objects related to text analysis.

Description

These are objects that can be used for modeling, especially in conjunction with the textrecipes package.

Usage

weight

weight_scheme

token

max_times

min_times

max_tokens

Arguments

Value

Each object is generated by either new_quant_param or new_qual_param.

Format

An object of class quant_param (inherits from param) of length 7.

Details

These objects are pre-made parameter sets that are useful in a variety of models.

  • min_times, max_times: frequency of word occurances for removal. See ?step_tokenfilter.

  • max_tokens: the number of tokens that will be retained. See ?step_tokenfilter.

  • weight: A parameter for "double normalization" when creating token counts. See ?step_tf.

  • weight_scheme: the method for term frequency calculations. Possible values are: "binary", "raw count", "term frequency", "log normalization", or "double normalization". See ?step_tf.

  • token: the type of token with possible values: "characters", "character_shingle", "lines", "ngrams", "paragraphs", "ptb", "regex", "sentences", "skip_ngrams", "tweets", "words", "word_stems". See ?step_tokenize