Learn R Programming

rrda (version 0.2.3)

rrda.cv: Cross-validation for Ridge Redundancy Analysis

Description

This function performs cross-validation to evaluate the performance of Ridge Redundancy Analysis (RDA) models. It calculates the mean squared error (MSE) for different ranks and ridge penalty values through cross-validation folds. The function also supports centering and scaling of the input matrices.

The range of lambda for the cross-validation is automatically calculated following the method of "glmnet" (Friedman et al., 2010). When we have a matrix of response variables (Y; n times q matrix) and a matrix of explanatory variables (X; n times p matrix), the largest lambda for the validation is obtained as follows

$$ \lambda_{\text{max}} = \frac{\max_{j \in \{1, 2, \dots, p\}} \sqrt{\sum_{k=1}^{q} \left( \sum_{i=1}^{n} (x_{ij}\cdot y_{ik}) \right)^2}}{N \times 10^{-3}}$$

Then, we define \(\lambda_{min}=10^{-4}\lambda_{max}\), and the sequence \(\lambda\) is generated based on the range.

Also, to reduce the computation, the variable sampling is performed for the large matrix of X and Y (by default, when the number of the variables is over 1000). Alternatively, the range of lambda can be specified manually.

Usage

rrda.cv(
  Y,
  X,
  maxrank = NULL,
  lambda = NULL,
  num.lambda = 50,
  nfold = 5,
  folds = NULL,
  sample.X = 1000,
  sample.Y = 1000,
  scale.X = FALSE,
  scale.Y = FALSE,
  center.X = TRUE,
  center.Y = TRUE,
  verbose = TRUE
)

Value

A list containing the cross-validated MSE matrix, lambda values, rank values, and the optimal lambda and rank.

Arguments

Y

A numeric matrix of response variables.

X

A numeric matrix of explanatory variables.

maxrank

A numeric vector specifying the maximum rank of the coefficient Bhat. Default is NULL, which sets it to (min(15, min(dim(X), dim(Y)))).

lambda

A numeric vector of ridge penalty values. Default is NULL, where the lambda values are automatically chosen.

num.lambda

A number of lambda generated (only when the lambda is not given by user). Default is 50.

nfold

The number of folds for cross-validation. Default is 5.

folds

A vector specifying the folds. Default is NULL, which randomly assigns folds.

sample.X

A number of variables sampled from X for the lamdba range estimate. Default is 1000.

sample.Y

A number of variables sampled from Y for the lamdba range estimate. Default is 1000.

scale.X

Logical indicating if X should be scaled. If TRUE, scales X. Default is FALSE.

scale.Y

Logical indicating if Y should be scaled. If TRUE, scales Y. Default is FALSE.

center.X

Logical indicating if X should be centered. If TRUE, scales X. Default is TRUE.

center.Y

Logical indicating if Y should be centered. If TRUE, scales Y. Default is TRUE.

verbose

Logical indicating. If TRUE, the function displays information about the function call. Default is TRUE.

Examples

Run this code
if (FALSE) {
set.seed(10)
simdata<-rdasim1(n = 100,p = 200,q = 200,k = 3)
X <- simdata$X
Y <- simdata$Y
cv_result<- rrda.cv(Y = Y, X = X, maxrank = 5, nfold = 5)
rrda.summary(cv_result = cv_result)

##Complete Example##



# library(future) # <- if you want to compute in parallel

# plan(multisession) # <- if you want to compute in parallel
# cv_result<- rrda.cv(Y = Y, X = X, maxrank = 5, nfold = 5) # cv
# plan(multisession) # <- To come back to sequential computing

# rrda.summary(cv_result = cv_result) # cv result

p <- rrda.plot(cv_result) # cv result plot
print(p)
h <- rrda.heatmap(cv_result) # cv result heatmao
print(h)

estimated_lambda<-cv_result$opt_min$lambda  # selected parameter
estimated_rank<-cv_result$opt_min$rank # selected parameter

Bhat <- rrda.fit(Y = Y, X = X, nrank = estimated_rank,lambda = estimated_lambda) # fitting
Bhat_mat<-rrda.coef(Bhat)
Yhat_mat <- rrda.predict(Bhat = Bhat, X = X) # prediction
Yhat<-Yhat_mat[[1]][[1]][[1]] # predicted values

cor_Y_Yhat<-diag(cor(Y,Yhat)) # correlation
summary(cor_Y_Yhat)
}

Run the code above in your browser using DataLab