stylest (version 0.1.0)

stylest_select_vocab: Select vocabulary using cross-validated out-of-sample prediction

Description

Selects optimal vocabulary quantile(s) for model fitting using performance on predicting out-of-sampletexts.

Usage

stylest_select_vocab(x, speaker, filter = NULL, smooth = 0.5,
  nfold = 5, cutoff_pcts = c(50, 60, 70, 80, 90, 99))

Arguments

x

Corpus as text vector. May be a corpus_frame object

speaker

Vector of speaker labels. Should be the same length as x

filter

if not NULL, a corpus text_filter

smooth

value for smoothing. Defaults to 0.5

nfold

Number of folds for cross-validation. Defaults to 5

cutoff_pcts

Vector of cutoff percentages to test. Defaults to c(50, 60, 70, 80, 90, 99)

Value

List of: best cutoff percent with the best speaker classification rate; cutoff percentages that were tested; matrix of the mean percentage of incorrectly identified speakers for each cutoff percent and fold; and the number of folds for cross-validation

Examples

Run this code
# NOT RUN {
data(novels_excerpts)
stylest_select_vocab(novels_excerpts$text, novels_excerpts$author, cutoff_pcts = c(50, 90))
# }
# NOT RUN {
  
# }

Run the code above in your browser using DataCamp Workspace