This function validates PARAFAC with different numbers of components by means of splitting the data cube in halves, fitting PARAFAC to them and comparing the results [1].
feemsplithalf(cube, nfac, splits, random, …)
# S3 method for feemsplithalf
print(x, …)
A feemcube
object.
An integer vector of numbers of factors to check.
Number of parts to split the data cube into. Must be even. After splitting, all ways to recombine the parts into non-intersecting halves are enumerated [2], the halves are subjected to PARAFAC decomposition and compared against each other.
The number of PARAFAC models fitted is \(% {{\mathtt{splits}} \choose {\mathtt{splits}/2}} \).
Mutually incompatible with random
parameter.
Number of times to shuffle the dataset, split into non-intersecting halves, fit a PARAFAC model to each of the halves and compare halves against each other.
The number of PARAFAC models fitted is \(2 \cdot \mathtt{random}\).
Mutually incompatible with splits
parameter.
An object returned by feemsplithalf
.
feemsplithalf
: remaining options are passed to
feemparafac
and, eventually, to parafac
.
print.feemsplithalf
: no options are allowed.
An object of class feemsplithalf
, containing named fields:
A list
of feemparafac
objects containing
the factors of the halves. The list has dimensions, the first one
corresponding to the halves (always 2), the second to different
numbers of factors (as many in nfac
) and the third to
different groupings of the samples (depends on splits
or
random
).
A named list containing arrays of Tucker's congruence coefficients
between the halves. Each entry in the list corresponds to an element
in the nfac
argument. The dimensions of each array in the
list correspond to, in order: the factors (1 to nfac), the
modes (emission or excitation) and the groupings of the samples
(depending on splits
or random
).
A copy of nfac
argument.
Pass either splits
or random
parameter, but not both, as
they are mutually exclusive.
As the models are fitted, they are compared to the first model of the
same number of factors (Tucker's congruence coefficient is calculated
using congru
for emission and excitation mode factors,
then the smallest value of the two is chosen for the purposes of
matching). The models are first reordered according to the best match
by TCC value, then rescaled [3] so that:
$$\sum_r \left( % \sum_i (S_{1,r} A_{i,r} - A^\mathrm{orig}_{i,r})^2 + % \sum_j (S_{2,r} B_{j,r} - B^\mathrm{orig}_{j,r})^2 % \right) \rightarrow \min_\mathbf{S} $$
subject to \(% S_{3,r} = \frac{1}{S_{1,r} S_{2,r}} \; \forall r\).
W.S. DeSarbo, An Application of PARAFAC to a Small Sample Problem, Demonstrating Preprocessing, Orthogonality Constraints, and Split-Half Diagnostic Techniques (Appendix), Social Science Research Network, Rochester, NY, 1984. https://papers.ssrn.com/abstract=2783446
K.R. Murphy, C.A. Stedmon, D. Graeber, R. Bro, Fluorescence spectroscopy and multi-way techniques. PARAFAC, Analytical Methods. 5 (2013) 6557. 10.1039/c3ay41160e
J. Riu, R. Bro, Jack-knife technique for outlier detection and estimation of standard errors in PARAFAC models, Chemometrics and Intelligent Laboratory Systems. 65 (2003) 35-49. 10.1016/S0169-7439(02)00090-4
# NOT RUN {
data(feems)
cube <- feemscale(
feemscatter(feemcube(feems, FALSE), rep(24, 4)), na.rm = TRUE
)
(sh <- feemsplithalf( # takes a long time
cube, 2:4, splits = 4, # 4 splits => S4C6T3
# the rest is passed to multiway::parafac
const = rep('nonneg', 3) # setting ctol and maxit is recommended
))
plot(sh)
# }
Run the code above in your browser using DataLab