Last chance! 50% off unlimited learning
Sale ends in
Transcript apply descriptive word statistics.
word_stats(
text.var,
grouping.var = NULL,
tot = NULL,
parallel = FALSE,
rm.incomplete = FALSE,
digit.remove = FALSE,
apostrophe.remove = FALSE,
digits = 3,
...
)
The text variable or a "word_stats"
object (i.e., the
output of a word_stats
function).
The grouping variables. Default NULL
generates
one word list for all text. Also takes a single grouping variable or a list
of 1 or more grouping variables.
Optional turns of talk variable that yields turn of talk measures.
logical. If TRUE
attempts to run the function on
multiple cores. Note that this may not mean a speed boost if you have one
core or if the data set is smaller as the cluster takes time to create
(parallel is slower until approximately 10,000 rows). To reduce run time
pass a "word_stats"
object to the word_stats
function.
logical. If TRUE
incomplete statements are
removed from calculations in the output.
logical. If TRUE
removes digits from calculating
the output.
logical. If TRUE
removes apostrophes from
calculating the output.
Integer; number of decimal places to round when printing.
Any other arguments passed to end_inc
.
Returns a list of three descriptive word statistics:
A data frame of descriptive word statistics by row
A data frame of word/sentence statistics per grouping variable:
n.tot - number of turns of talk
n.sent - number of sentences
n.words - number of words
n.char - number of characters
n.syl - number of syllables
n.poly - number of polysyllables
sptot - syllables per turn of talk
wptot - words per turn of talk
wps - words per sentence
cps - characters per sentence
sps - syllables per sentence
psps - poly-syllables per sentence
cpw - characters per word
spw - syllables per word
n.state - number of statements
n.quest - number of questions
n.exclm - number of exclamations
n.incom - number of incomplete statements
p.state - proportion of statements
p.quest - proportion of questions
p.exclm - proportion of exclamations
p.incom - proportion of incomplete statements
n.hapax - number of hapax legomenon
n.dis - number of dis legomenon
grow.rate - proportion of hapax legomenon to words
prop.dis - proportion of dis legomenon to words
An account of sentences with an improper/missing end mark
A data frame with word element columns from gts
A data frame with sentence element columns from gts
Counter of omitted sentences for internal use (only included if some rows contained missing values)
The value of percent used for plotting purposes.
The value of zero.replace used for plotting purposes.
integer value od number of digits to display; mostly internal use
It is assumed the user has run sentSplit
on their
data, otherwise some counts may not be accurate.
Note that a sentence is classified with only one endmark. An imperative sentence is classified only as imperative (not as a state, quest, or exclm as well). If a sentence is both imperative and incomplete the sentence will be counted as incomplete rather than imperative. labeled as both imperative
# NOT RUN {
word_stats(mraja1spl$dialogue, mraja1spl$person)
(desc_wrds <- with(mraja1spl, word_stats(dialogue, person, tot = tot)))
## Recycle for speed boost
with(mraja1spl, word_stats(desc_wrds, person, tot = tot))
scores(desc_wrds)
counts(desc_wrds)
htruncdf(counts(desc_wrds), 15, 6)
plot(scores(desc_wrds))
plot(counts(desc_wrds))
names(desc_wrds)
htruncdf(desc_wrds$ts, 15, 5)
htruncdf(desc_wrds$gts, 15, 6)
desc_wrds$mpun
desc_wrds$word.elem
desc_wrds$sent.elem
plot(desc_wrds)
plot(desc_wrds, label=TRUE, lab.digits = 1)
## Correlation Visualization
qheat(cor(scores(desc_wrds)[, -1]), diag.na = TRUE, by.column =NULL,
low = "yellow", high = "red", grid = FALSE)
## Parallel (possible speed boost)
with(mraja1spl, word_stats(dialogue, list(sex, died, fam.aff)))
with(mraja1spl, word_stats(dialogue, list(sex, died, fam.aff),
parallel = TRUE))
## Recycle for speed boost
word_stats(desc_wrds, mraja1spl$sex)
# }
Run the code above in your browser using DataLab