tm (version 0.6-1)

weightSMART: SMART Weightings

Description

Weight a term-document matrix according to a combination of weights specified in SMART notation.

Usage

weightSMART(m, spec = "nnn", control = list())

Arguments

m
A TermDocumentMatrix in term frequency format.
spec
a character string consisting of three characters. The first letter specifies a term frequency schema, the second a document frequency schema, and the third a normalization schema. See Details for available built-in schemata.
control
a list of control parameters. See Details.

Value

  • The weighted matrix.

encoding

UTF-8

Details

Formally this function is of class WeightingFunction with the additional attributes Name and Acronym.

The first letter of spec specifies a weighting schema for term frequencies of m:

[object Object],[object Object],[object Object],[object Object],[object Object]

The second letter of spec specifies a weighting schema of document frequencies for m:

[object Object],[object Object],[object Object]

The third letter of spec specifies a schema for normalization of m:

[object Object],[object Object],[object Object],[object Object] The final result is defined by multiplication of the chosen term frequency component with the chosen document frequency component with the chosen normalization component.

References

Christopher D. Manning and Prabhakar Raghavan and Hinrich Schütze (2008). Introduction to Information Retrieval. Cambridge University Press, ISBN 0521865719.

Examples

data("crude")
TermDocumentMatrix(crude,
                   control = list(removePunctuation = TRUE,
                                  stopwords = TRUE,
                                  weighting = function(x)
                                  weightSMART(x, spec = "ntc")))