feemsplithalf: Split-half analysis of PARAFAC models

Description

This function validates PARAFAC with different numbers of components by means of splitting the data cube in halves, fitting PARAFAC to them and comparing the results [1].

Usage

feemsplithalf(cube, nfac, splits, random, …)
  # S3 method for feemsplithalf
print(x, …)

Arguments

cube

A feemcube object.

nfac

An integer vector of numbers of factors to check.

splits

Number of parts to split the data cube into. Must be even. After splitting, all ways to recombine the parts into non-intersecting halves are enumerated [2], the halves are subjected to PARAFAC decomposition and compared against each other.

The number of PARAFAC models fitted is $% {{\mathtt{splits}} \choose {\mathtt{splits}/2}} $.

Mutually incompatible with random parameter.

random

Number of times to shuffle the dataset, split into non-intersecting halves, fit a PARAFAC model to each of the halves and compare halves against each other.

The number of PARAFAC models fitted is $2 \cdot \mathtt{random}$.

Mutually incompatible with splits parameter.

An object returned by feemsplithalf.

…

feemsplithalf: remaining options are passed to feemparafac and, eventually, to parafac.

print.feemsplithalf: no options are allowed.

Value

An object of class feemsplithalf, containing named fields:

factors

A list of feemparafac objects containing the factors of the halves. The list has dimensions, the first one corresponding to the halves (always 2), the second to different numbers of factors (as many in nfac) and the third to different groupings of the samples (depends on splits or random).

tcc

A named list containing arrays of Tucker's congruence coefficients between the halves. Each entry in the list corresponds to an element in the nfac argument. The dimensions of each array in the list correspond to, in order: the factors (1 to nfac), the modes (emission or excitation) and the groupings of the samples (depending on splits or random).

nfac

A copy of nfac argument.

Details

Pass either splits or random parameter, but not both, as they are mutually exclusive.

As the models are fitted, they are compared to the first model of the same number of factors (Tucker's congruence coefficient is calculated using congru for emission and excitation mode factors, then the smallest value of the two is chosen for the purposes of matching). The models are first reordered according to the best match by TCC value, then rescaled [3] so that:

$$\sum_r \left( % \sum_i (S_{1,r} A_{i,r} - A^\mathrm{orig}_{i,r})^2 + % \sum_j (S_{2,r} B_{j,r} - B^\mathrm{orig}_{j,r})^2 % \right) \rightarrow \min_\mathbf{S} $$

subject to $% S_{3,r} = \frac{1}{S_{1,r} S_{2,r}} \; \forall r$.

References

W.S. DeSarbo, An Application of PARAFAC to a Small Sample Problem, Demonstrating Preprocessing, Orthogonality Constraints, and Split-Half Diagnostic Techniques (Appendix), Social Science Research Network, Rochester, NY, 1984. https://papers.ssrn.com/abstract=2783446
K.R. Murphy, C.A. Stedmon, D. Graeber, R. Bro, Fluorescence spectroscopy and multi-way techniques. PARAFAC, Analytical Methods. 5 (2013) 6557. 10.1039/c3ay41160e
J. Riu, R. Bro, Jack-knife technique for outlier detection and estimation of standard errors in PARAFAC models, Chemometrics and Intelligent Laboratory Systems. 65 (2003) 35-49. 10.1016/S0169-7439(02)00090-4

Examples

Run this code

# NOT RUN {
  data(feems)
  cube <- feemscale(
    feemscatter(feemcube(feems, FALSE), rep(24, 4)), na.rm = TRUE
  )
  (sh <- feemsplithalf( # takes a long time
    cube, 2:4, splits = 4, # 4 splits => S4C6T3
    # the rest is passed to multiway::parafac
    const = rep('nonneg', 3) # setting ctol and maxit is recommended
  ))
  plot(sh)
# }

Run the code above in your browser using DataLab