Learn R Programming

RcmdrPlugin.temis (version 0.6.1)

vocabularyTable: Vocabulary summary table

Description

Build a table summarizing vocabulary, optionally over a variable.

Usage

vocabularyTable(termsDtm, wordsDtm, variable = NULL, unit = c("document", "global"))

Arguments

termsDtm
A document-term matrix containing terms (i.e. extracted from a possibly stemmed corpus).
wordsDtm
A document-term matrix contaning words (i.e. extracted from a plain corpus).
variable
A vector of the same length as lengthDtm giving indexes according to which categories should be defined. If NULL, per-document measures are returned.
unit
When variable is not NULL, defines the way measures are aggregated (see below).

emph

  • terms
  • words

code

unit

itemize

  • document:

item

global:

dQuote

Corpus

Details

This dialog allows creating tables providing several vocabulary measures for each document or each category of documents in the corpus:
  • total number of terms
number and percent of unique terms (i.e. appearing at least once) number and percent of hapax legomena (i.e. terms appearing once and only once) total number of words number and percent of long words (long being defined as at least seven characters number and percent of very long words (very long being defined as at least ten characters average word length

See Also

vocabularyDlg, code{DocumentTermMatrix}, table,