Corpus as text vector. May be a corpus_frame object
speaker
Vector of speaker labels. Should be the same length as
x
filter
if not NULL, a corpus text_filter
smooth
value for smoothing. Defaults to 0.5
nfold
Number of folds for cross-validation. Defaults to 5
cutoff_pcts
Vector of cutoff percentages to test. Defaults to
c(50, 60, 70, 80, 90, 99)
cutoffs_term_weights
Named list of dataframes of term weights,
where the names correspond to the cutoff_pcts. Each dataframe
should have one column $word and a second column $weight_varname
containing the weight for the word.
See the vignette for details.
fill_method
if "value" (default), fill_weight is
used to fill any terms with NA weight. If "mean", the
mean term_weight should be used as the fill value
fill_weight
numeric value to fill in as weight for any term
which does not have a weight specified in term_weights,
default=1.0
weight_varname
Name of the column in each term_weights dataframe containing
the weights, default="mean_distance"
Value
List of: best cutoff percent with the best speaker classification
rate; cutoff percentages that were tested; matrix of the mean percentage of
incorrectly identified speakers for each cutoff percent and fold; and the
number of folds for cross-validation