
Last chance! 50% off unlimited learning
Sale ends in
Get the count of tokens (total features) or types (unique tokens).
ntoken(x, ...)ntype(x, ...)
additional arguments passed to tokens()
named integer vector of the counts of the total tokens or types
The precise definition of "tokens" for objects not yet tokenized (e.g.
character or corpus objects) can be controlled through optional
arguments passed to tokens()
through ...
.
For dfm objects, ntype
will only return the count of features
that occur more than zero times in the dfm.
# NOT RUN {
# simple example
txt <- c(text1 = "This is a sentence, this.", text2 = "A word. Repeated repeated.")
ntoken(txt)
ntype(txt)
ntoken(char_tolower(txt)) # same
ntype(char_tolower(txt)) # fewer types
ntoken(char_tolower(txt), remove_punct = TRUE)
ntype(char_tolower(txt), remove_punct = TRUE)
# with some real texts
ntoken(corpus_subset(data_corpus_inaugural, Year < 1806), remove_punct = TRUE)
ntype(corpus_subset(data_corpus_inaugural, Year < 1806), remove_punct = TRUE)
ntoken(dfm(corpus_subset(data_corpus_inaugural, Year < 1800)))
ntype(dfm(corpus_subset(data_corpus_inaugural, Year < 1800)))
# }
Run the code above in your browser using DataLab