Formally this function is of class `WeightingFunction`

with the
additional attributes `name`

and `acronym`

.

The first letter of `spec`

specifies a weighting schema for term
frequencies of `m`

:

- "n"
(natural) \(\mathit{tf}_{i,j}\) counts the number of occurrences
\(n_{i,j}\) of a term \(t_i\) in a document \(d_j\). The
input term-document matrix `m`

is assumed to be in this
standard term frequency format already.

- "l"
(logarithm) is defined as \(1 + \log_2(\mathit{tf}_{i,j})\).

- "a"
(augmented) is defined as \(0.5 +
\frac{0.5 * \mathit{tf}_{i,j}}{\max_i(\mathit{tf}_{i,j})}\).

- "b"
(boolean) is defined as 1 if \(\mathit{tf}_{i,j} > 0\) and 0 otherwise.

- "L"
(log average) is defined as \(\frac{1 +
\log_2(\mathit{tf}_{i,j})}{1+\log_2(\mathrm{ave}_{i\in j}(\mathit{tf}_{i,j}))}\).

The second letter of `spec`

specifies a weighting schema of
document frequencies for `m`

:

- "n"
(no) is defined as 1.

- "t"
(idf) is defined as \(\log_2 \frac{N}{\mathit{df}_t}\) where
\(\mathit{df}_t\) denotes how often term \(t\) occurs in all
documents.

- "p"
(prob idf) is defined as \(\max(0, \log_2(\frac{N - \mathit{df}_t}{\mathit{df}_t}))\).

The third letter of `spec`

specifies a schema for normalization
of `m`

:

- "n"
(none) is defined as 1.

- "c"
(cosine) is defined as \(\sqrt{\mathrm{col\_sums}(m ^ 2)}\).

- "u"
(pivoted unique) is defined as \(\mathit{slope} *
\sqrt{\mathrm{col\_sums}(m ^ 2)} + (1 - \mathit{slope}) *
\mathit{pivot}\) where both `slope`

and `pivot`

must be set
via named tags in the `control`

list.

- "b"
(byte size) is defined as
\(\frac{1}{\mathit{CharLength}^\alpha}\). The parameter
\(\alpha\) must be set via the named tag `alpha`

in the `control`

list.

The final result is defined by multiplication of the chosen term
frequency component with the chosen document frequency component with
the chosen normalization component.