In an LSA process, the diagonal matrix of the singular value decomposition is
usually reduced to a specific number of dimensions (also `factors' or `singular values').
The functions dimcalc\_share(), dimcalc\_ndocs(), dimcalc\_kaiser()
and also the redundant function dimcalc\_raw() offer methods to calculate a useful
number of singular values (based on the distribution and values of the given sequence
of singular values).
All of them are tightly coupled to the core LSA functions: they generates
a function to be executed by the calling (higher-level)
function lsa(). The output function contains only one parameter,
namely s, which is expected to be the sequence of singular values.
In lsa(), the code returned is executed, the mandatory
singular values are provided as a parameter within lsa().
The dimensionality calculation methods, however, can still be called directly
by adding a second, separate parameter set: e.g.
dimcalc\_share(share=0.2)(mysingularvalues)
The method dimcalc\_share() finds the first position in the descending sequence of
singular values s where their sum (divided by the sum of all
values) meets or exceeds the specified share.
The method dimcalc\_ndocs() calculates the first position in the descending sequence
of singular values where their sum meets or exceeds the number of documents.
The method dimcalc\_kaiser() calculates the number of singular values according to the
Kaiser-Criterium, i.e. from the descending order of values all values
with s[n] > 1 will be taken. The number of dimensions is returned
accordingly.
The method dimcalc_fraction() returns the specified share of the
number of singular values. Per default, 1/50th of the available values
will be used and the determined number of singular values will be returned.
The method dimcalc\_raw() return the maximum number of singular values (= the length
of s). It is here only for completeness.