Learn R Programming

RNewsflow (version 1.0.1)

term.day.dist: Calculate statistics for term occurence across days

Description

Calculate statistics for term occurence across days

Usage

term.day.dist(dtm, meta, id.var = "document_id", date.var = "date")

Arguments

dtm
A document-term matrix in the tm DocumentTermMatrix class or a TsparseMatrix from the Matrix class (spMatrix)
meta
A data.frame where rows are documents and columns are document meta information. Should contain 2 columns: the document name/id and date. The name/id column should match the rownames (i.e. document names) of the DTM, and its label is specified in the `id.var` argument. The date column should be intepretable with as.POSIXct, and its label is specified in the `date.var` argument.
id.var
The label for the document name/id column in the `meta` data.frame. Default is "document_id"
date.var
The label for the document date column in the `meta` data.frame . default is "date"

Value

A data.frame with statistics for each term.
  • freq: The number of times a term occurred
  • doc.freq: The number of documents in which a term occured
  • days.n: The number of days on which a term occured
  • days.pct: The percentage of days on which a term occured
  • days.entropy: The entropy of the distribution of term frequency across days
  • days.entropy.norm: The normalized days.entropy, where 1 is a discrete uniform distribution

Examples

Run this code
data(dtm)
data(meta)

tdd = term.day.dist(dtm, meta)
head(tdd)
tail(tdd)

Run the code above in your browser using DataLab