# weightSMART

##### SMART Weightings

Weight a term-document matrix according to a combination of weights specified in SMART notation.

##### Usage

`weightSMART(m, spec = "nnn", control = list())`

##### Arguments

- m
A

`TermDocumentMatrix`

in term frequency format.- spec
a character string consisting of three characters. The first letter specifies a term frequency schema, the second a document frequency schema, and the third a normalization schema. See

**Details**for available built-in schemata.- control
a list of control parameters. See

**Details**.

##### Details

Formally this function is of class `WeightingFunction`

with the
additional attributes `name`

and `acronym`

.

The first letter of `spec`

specifies a weighting schema for term
frequencies of `m`

:

- "n"
(natural) \(\mathit{tf}_{i,j}\) counts the number of occurrences \(n_{i,j}\) of a term \(t_i\) in a document \(d_j\). The input term-document matrix

`m`

is assumed to be in this standard term frequency format already.- "l"
(logarithm) is defined as \(1 + \log_2(\mathit{tf}_{i,j})\).

- "a"
(augmented) is defined as \(0.5 + \frac{0.5 * \mathit{tf}_{i,j}}{\max_i(\mathit{tf}_{i,j})}\).

- "b"
(boolean) is defined as 1 if \(\mathit{tf}_{i,j} > 0\) and 0 otherwise.

- "L"
(log average) is defined as \(\frac{1 + \log_2(\mathit{tf}_{i,j})}{1+\log_2(\mathrm{ave}_{i\in j}(\mathit{tf}_{i,j}))}\).

The second letter of `spec`

specifies a weighting schema of
document frequencies for `m`

:

- "n"
(no) is defined as 1.

- "t"
(idf) is defined as \(\log_2 \frac{N}{\mathit{df}_t}\) where \(\mathit{df}_t\) denotes how often term \(t\) occurs in all documents.

- "p"
(prob idf) is defined as \(\max(0, \log_2(\frac{N - \mathit{df}_t}{\mathit{df}_t}))\).

The third letter of `spec`

specifies a schema for normalization
of `m`

:

- "n"
(none) is defined as 1.

- "c"
(cosine) is defined as \(\sqrt{\mathrm{col\_sums}(m ^ 2)}\).

- "u"
(pivoted unique) is defined as \(\mathit{slope} * \sqrt{\mathrm{col\_sums}(m ^ 2)} + (1 - \mathit{slope}) * \mathit{pivot}\) where both

`slope`

and`pivot`

must be set via named tags in the`control`

list.- "b"
(byte size) is defined as \(\frac{1}{\mathit{CharLength}^\alpha}\). The parameter \(\alpha\) must be set via the named tag

`alpha`

in the`control`

list.

The final result is defined by multiplication of the chosen term frequency component with the chosen document frequency component with the chosen normalization component.

##### Value

The weighted matrix.

##### References

Christopher D. Manning and Prabhakar Raghavan and Hinrich Sch<U+00FC>tze (2008).
*Introduction to Information Retrieval*.
Cambridge University Press, ISBN 0521865719.

##### Examples

```
# NOT RUN {
data("crude")
TermDocumentMatrix(crude,
control = list(removePunctuation = TRUE,
stopwords = TRUE,
weighting = function(x)
weightSMART(x, spec = "ntc")))
# }
```

*Documentation reproduced from package tm, version 0.7-1, License: GPL-3*