In an LSA process, the diagonal matrix of the singular value decomposition is
usually reduced to a specific number of dimensions (also `factors' or `singular values').
The functions dimcalc\_share()
, dimcalc\_ndocs()
, dimcalc\_kaiser()
and also the redundant function dimcalc\_raw()
offer methods to calculate a useful
number of singular values (based on the distribution and values of the given sequence
of singular values).
All of them are tightly coupled to the core LSA functions: they generates
a function to be executed by the calling (higher-level)
function lsa()
. The output function contains only one parameter,
namely s
, which is expected to be the sequence of singular values.
In lsa()
, the code returned is executed, the mandatory
singular values are provided as a parameter within lsa()
.
The dimensionality calculation methods, however, can still be called directly
by adding a second, separate parameter set: e.g.
dimcalc\_share(share=0.2)(mysingularvalues)
The method dimcalc\_share()
finds the first position in the descending sequence of
singular values s
where their sum (divided by the sum of all
values) meets or exceeds the specified share.
The method dimcalc\_ndocs()
calculates the first position in the descending sequence
of singular values where their sum meets or exceeds the number of documents.
The method dimcalc\_kaiser()
calculates the number of singular values according to the
Kaiser-Criterium, i.e. from the descending order of values all values
with s[n] > 1
will be taken. The number of dimensions is returned
accordingly.
The method dimcalc_fraction()
returns the specified share of the
number of singular values. Per default, 1/50th of the available values
will be used and the determined number of singular values will be returned.
The method dimcalc\_raw()
return the maximum number of singular values (= the length
of s
). It is here only for completeness.