The main function in stylest
, stylest_fit
fits a
model using a corpus of texts labeled by speaker.
stylest_fit(
x,
speaker,
terms = NULL,
filter = NULL,
smooth = 0.5,
term_weights = NULL,
fill_method = "value",
fill_weight = 0,
weight_varname = "mean_distance"
)
Text vector. May be a corpus_frame
object
Vector of speaker labels. Should be the same length as
x
If not NULL
, terms to be used in the model. If
NULL
, use all terms
If not NULL
, a text filter to specify the tokenization.
See corpus
for more information about specifying filter
Numeric value used smooth term frequencies instead of the default of 0.5
Dataframe of distances (or any weights) per word in the vocab. This dataframe should have one column $word and a second column $weight_var containing the weight for the word. See the vignette for details.
if "value"
(default), fill_weight
is
used to fill any terms with NA
weight. If "mean"
, the
mean term_weight should be used as the fill value
numeric value to fill in as weight for any term
which does not have a weight specified in term_weights
,
default=0.0
(drops any words without weights)
Name of the column in term_weights containing the weights,
default="mean_distance"
A S3 stylest_model
object containing:
speakers
Vector of unique speakers,
filter
text_filter used,
terms
terms used in fitting the model,
ntoken
Vector of number of tokens per speaker,
smooth
Smoothing value,
weights
If not NULL, a named matrix of weights for each term in the vocab,
rate
Matrix of speaker rates for each term in vocabulary
The user may specify only one of terms
or cutoff
.
If neither is specified, all terms will be used.
# NOT RUN {
data(novels_excerpts)
speaker_mod <- stylest_fit(novels_excerpts$text, novels_excerpts$author)
# }
Run the code above in your browser using DataLab