o2m: Perform O2PLS data integration with two-way orthogonal corrections

Description

NOTE THAT THIS FUNCTION DOES NOT CENTER NOR SCALE THE MATRICES! Any normalization you will have to do yourself. It is best practice to at least center the variables though.

Usage

o2m(
  X,
  Y,
  n,
  nx,
  ny,
  stripped = FALSE,
  p_thresh = 3000,
  q_thresh = p_thresh,
  tol = 1e-10,
  max_iterations = 1000,
  sparse = F,
  groupx = NULL,
  groupy = NULL,
  keepx = NULL,
  keepy = NULL,
  max_iterations_sparsity = 1000
)

Value

A list containing

Tt: Joint \(X\) scores
W.: Joint \(X\) loadings
U: Joint \(Y\) scores
C.: Joint \(Y\) loadings
E: Residuals in \(X\)
Ff: Residuals in \(Y\)
T_Yosc: Orthogonal \(X\) scores
P_Yosc.: Orthogonal \(X\) loadings
W_Yosc: Orthogonal \(X\) weights
U_Xosc: Orthogonal \(Y\) scores
P_Xosc.: Orthogonal \(Y\) loadings
C_Xosc: Orthogonal \(Y\) weights
B_U: Regression coefficient in Tt ~ U
B_T.: Regression coefficient in U ~ Tt
H_TU: Residuals in Tt in Tt ~ U
H_UT: Residuals in U in U ~ Tt
X_hat: Prediction of \(X\) with \(Y\)
Y_hat: Prediction of \(Y\) with \(X\)
R2X: Variation (measured with ssq) of the modeled part in \(X\) (defined by joint + orthogonal variation) as proportion of variation in \(X\)
R2Y: Variation (measured with ssq) of the modeled part in \(Y\) (defined by joint + orthogonal variation) as proportion of variation in \(Y\)
R2Xcorr: Variation (measured with ssq) of the joint part in \(X\) as proportion of variation in \(X\)
R2Ycorr: Variation (measured with ssq) of the joint part in \(Y\) as proportion of variation in \(Y\)
R2X_YO: Variation (measured with ssq) of the orthogonal part in \(X\) as proportion of variation in \(X\)
R2Y_XO: Variation (measured with ssq) of the orthogonal part in \(Y\) as proportion of variation in \(Y\)
R2Xhat: Variation (measured with ssq) of the predicted \(X\) as proportion of variation in \(X\)
R2Yhat: Variation (measured with ssq) of the predicted \(Y\) as proportion of variation in \(Y\)
W_gr: Joint loadings of \(X\) at group level (only available when GO2PLS is used)
C_gr: Joint loadings of \(Y\) at group level (only available when GO2PLS is used)

Arguments

X: Numeric matrix. Vectors will be coerced to matrix with as.matrix (if this is possible)
Y: Numeric matrix. Vectors will be coerced to matrix with as.matrix (if this is possible)
n: Integer. Number of joint PLS components. Must be positive.
nx: Integer. Number of orthogonal components in \(X\). Negative values are interpreted as 0
ny: Integer. Number of orthogonal components in \(Y\). Negative values are interpreted as 0
stripped: Logical. Use the stripped version of o2m (usually when cross-validating)?
p_thresh: Integer. If X has more than p_thresh columns, a power method optimization is used, see o2m2
q_thresh: Integer. If Y has more than q_thresh columns, a power method optimization is used, see o2m2
tol: Double. Threshold for which the NIPALS method is deemed converged. Must be positive.
max_iterations: Integer. Maximum number of iterations for the NIPALS method.
sparse: Boolean. Default value is FALSE, in which case O2PLS will be fitted. Set to TRUE for GO2PLS.
groupx: Vector. Used when sparse = TRUE. A vector of strings indicating group names of each X-variable. Its length must be equal to the number of variables in \(X\). The order of group names must corresponds to the order of the variables.
groupy: Vector. Used when sparse = TRUE. A vector of strings indicating group names of each Y-variable. The length must be equal to the number of variables in \(Y\). The order of group names must corresponds to the order of the variables.
keepx: Vector. Used when sparse = TRUE. A vector of length n indicating how many variables (or groups if groupx is provided) to keep in each of the joint component of \(X\). If the input is an integer, all the components will have the same amount of variables or groups retained.
keepy: Vector. Used when sparse = TRUE. A vector of length n indicating how many variables (or groups if groupx is provided) to keep in each of the joint component of \(Y\). If the input is an integer, all the components will have the same amount of variables or groups retained.
max_iterations_sparsity: Integer. Used when sparse = TRUE. Maximum number of iterations for the NIPALS method for GO2PLS.

Details

If both nx and ny are zero, o2m is equivalent to PLS2 with orthonormal loadings. This is a `slower' (in terms of memory) implementation of O2PLS, and is using svd, use stripped=T for a stripped version with less output. If either ncol(X) > p_thresh or ncol(Y) > q_thresh, the NIPALS method is used which does not store the entire covariance matrix. The squared error between iterands in the NIPALS approach can be adjusted with tol. The maximum number of iterations in the NIPALS approach is tuned by max_iterations.

Examples

Run this code

test_X <- scale(matrix(rnorm(100*10),100,10))
test_Y <- scale(matrix(rnorm(100*11),100,11))
#  --------- Default run ------------ 
o2m(test_X, test_Y, 3, 2, 1)
#  ---------- Stripped version ------------- 
o2m(test_X, test_Y, 3, 2, 1, stripped = TRUE)
#  ---------- High dimensional version ---------- 
o2m(test_X, test_Y, 3, 2, 1, p_thresh = 1)
#  ------ High D and stripped version --------- 
o2m(test_X, test_Y, 3, 2, 1, stripped = TRUE, p_thresh = 1)
#  ------ Now with more iterations -------- 
o2m(test_X, test_Y, 3, 2, 1, stripped = TRUE, p_thresh = 1, max_iterations = 1e6)
#  ----------------------------------