Learn R Programming

polmineR (version 0.7.11)

size: Get Number of Tokens.

Description

The method will get the number of tokens in a corpus or partition, or the dispersion across one or more s-attributes.

Usage

size(x, ...)

# S4 method for character size(x, s_attribute = NULL, verbose = TRUE, ...)

# S4 method for partition size(x, s_attribute = NULL, ...)

# S4 method for DocumentTermMatrix size(x)

# S4 method for TermDocumentMatrix size(x)

# S4 method for features size(x)

Arguments

x

object to get size(s) for

...

further arguments

s_attribute

character vector with s-attributes (one or more)

verbose

logical, whether to print messages

Value

an integer vector if s_attribute is NULL, a data.table otherweise

Details

One or more s-attributes can be provided to get the dispersion of tokens across one or more dimensions. Two or more s-attributes can lead to reasonable results only if the corpus XML is flat.

The size-method for features objects will return a named list with the size of the corpus of interest ("coi"), i.e. the number of tokens in the window, and the reference corpus ("ref"), i.e. the number of tokens that are not matched by the query and that are outside the window.

See Also

See dispersion-method for counts of hits. The hits method calls the size-method to get sizes of subcorpora.

Examples

Run this code
# NOT RUN {
use("polmineR")
size("GERMAPARLMINI")
size("GERMAPARLMINI", s_attribute = "date")
size("GERMAPARLMINI", s_attribute = c("date", "party"))

P <- partition("GERMAPARLMINI", date = "2009-11-11")
size(P, s_attribute = "speaker")
size(P, s_attribute = "party")
size(P, s_attribute = c("speaker", "party"))
# }

Run the code above in your browser using DataLab