catch: Fit a CATCH model and predict categorical response.

Description

The catch function solves classification problems and selects variables by fitting a covariate-adjusted tensor classification in high-dimensions (CATCH) model. The input training predictors include two parts: tensor data and low dimensional covariates. The tensor data could be matrix as a special case of tensor. In catch, tensor data should be stored in a list form. If the dataset contains no covariate, catch can also fit a classifier only based on the tensor predictors. If covariates are provided, the method will adjust tensor for covariates and then fit a classifier based on the adjusted tensor along with the covariates. If users specify testing data at the same time, predicted response will be obtained as well.

Usage

catch(x, z = NULL, y, testx = NULL, testz = NULL, nlambda = 100, 
lambda.factor = ifelse((nobs - nclass)

Arguments

Input tensor (or matrix) list of length $N$, where $N$ is the number of observations. Each element of the list is a tensor or matrix. The order of tensor can be any positive integer not less than 2.

Input covariate matrix of dimension $N \times q$, where $q<N$. z can be omitted if covariate is absent.

Class label. For K class problems, y takes values in $\{1,\cdots,\code{K}\}$.

testx

Input testing tensor or matrix list. Each element of the list is a test case. When testx is not provided, the function will only fit the model and return the classifier. When testx is provided, the function will predict response on testx as well.

testz

Input testing covariate matrix. Can be omitted if covariate is absent. However, training covariates z and testing covariates testz must be provided or not at the same time.

nlambda

The number of tuning values in sequence lambda. If users do not specify lambda values, the package will generate a solution path containing nlambda many tuning values of lambda. Default is 100.

lambda.factor

When lambda is not supplied, catch first finds the largest value in lambda which yields $\boldsymbol{\beta}=0$. Then the minimum value in lambda is obtained by (largest value*lambda.factor). The sequence of lambda is generated by evenly sampling nlambda numbers within the range. Default value of lambda.factor is 0.2 if $N<p$ and 0.0001 if $N>p$.

lambda

A sequence of user-specified lambda values. lambda is the weight of L1 penalty and a smaller lambda allows more variables to be nonzero. If NULL, then the algorithm will generate a sequence of nlambda many potential lambdas according to lambda.factor.

dfmax

The maximum number of selected variables in the model. Default is the number of observations N.

pmax

The maximum number of potential selected variables during iteration. In middle step, the algorithm can select at most pmax variables and then shrink part of them such that the nubmer of final selected variables is less than dfmax. Default is $\min(dfmax\times 2+20, N)$.

Weight of lasso penalty. Default is a vector of value 1 and length p, representing L1 penalty of length $p$. Can be mofidied to use adaptive lasso penalty.

eps

Convergence threshold for coordinate descent difference between iterations. Default value is 1e-04.

maxit

Maximum iteration times for all lambda. Default value is 1e+05.

sml

Threshold for ratio of loss function change after each iteration to old loss function value. Default value is 1e-06.

verbose

Indicates whether print out lambda during iteration or not. Default value is FALSE.

perturb

Perturbation scaler. If it is specified, the value will be added to diagonal of estimated covariance matrix. A small value can be used to accelerate iteration. Default value is NULL.

Value

beta

Output variable coefficients for each lambda, which is the estimation of $\boldsymbol{\beta}$ in the Bayes rule. beta is a list of length being the number of lambdas. Each element of beta is a matrix of dimension $nvars\times (nclass-1)$.

The number of nonzero variables for each value in sequence lambda.

dim

Dimension of coefficient array.

lambda

The actual lambda sequence used. The user specified sequence or automatically generated sequence could be truncated by constraints on dfmax and pmax.

obj

Objective function value for each value in sequence lambda.

The tensor list after adjustment in training data. If covariate is absent, this is the original input tensor list.

Class label in training data.

npasses

Total number of iterations.

jerr

Error flag.

sigma

Estimated covariance matrix on each mode. sigma is a list with the ith element being covariance matrix on ith mode.

delta

Estimated delta matrix $(vec(\widehat{\boldsymbol{\mu}}_2-\widehat{\boldsymbol{\mu}}_1),\cdots,vec(\widehat{\boldsymbol{\mu}}_K-\widehat{\boldsymbol{\mu}}_1))$.

Estimated mean array of $\mathbf{X}$.

prior

Proportion of samples in each class.

call

The call that produces this object.

pred

Predicted categorical response for each value in sequence lambda when testx is provided.

Details

The catch function fits a linear discriminant analysis model as follows: $$\mathbf{Z}|(Y=k)\sim N(\boldsymbol{\phi_k},\boldsymbol{\psi}),$$ $$\mathbf{X}|(\mathbf{Z}=\mathbf{z}, Y=k)\sim TN(\boldsymbol{\mu}_k+\boldsymbol{\alpha}\bar{\times}_{M+1}\mathbf{z},\boldsymbol{\Sigma}_1,\cdots,\boldsymbol{\Sigma}_M).$$ The categorical response is predicted from the estimated Bayes rule: $$\widehat{Y}=\arg\max_{k=1,\cdots,K}{a_k+\boldsymbol{\gamma}_k^T\mathbf{Z}+<\boldsymbol{\beta}_k,\mathbf{X}-\boldsymbol{\alpha}\overline{\times}_{M+1}\mathbf{Z}>},$$ where $\mathbf{X}$ is the tensor, $\mathbf{Z}$ is the covariates, $a_k$, $\boldsymbol{\gamma}_k$ and $\boldsymbol{\alpha}$ are parameters estimated by CATCH. A detailed explanation can be found in reference. When Z is not NULL, the function will first adjust tensor on covariates by modeling $$\mathbf{X}=\boldsymbol{\mu}_k+\boldsymbol{\alpha}\overline{\times}_{M+1}\mathbf{Z}+\mathbf{E},$$ where $\mathbf{E}$ is an unobservable tensor normal error independent of $\mathbf{Z}$. Then catch fits model on the adjusted training tensor $\mathbf{X}-\boldsymbol{\alpha}\overline{\times}_{M+1}\mathbf{Z}$ and makes predictions on testing data by using the adjusted tensor list. If Z is NULL, it reduces to a simple tensor discriminant analysis model.

The coefficient of tensor $\boldsymbol{\beta}$, represented by beta in package, is estimated by $$\min_{\boldsymbol{\beta}_2,\ldots,\boldsymbol{\beta}_K}\left[\sum_{k=2}^K\left(\langle\boldsymbol{\beta}_k,[\![\boldsymbol{\beta}_k;\widehat{\boldsymbol{\Sigma}}_{1},\dots,\widehat{\boldsymbol{\Sigma}}_{M}]\!]\rangle-2\langle\boldsymbol{\beta}_k,\widehat{\boldsymbol{\mu}}_{k}-\widehat{\boldsymbol{\mu}}_{1}\rangle\right)+\lambda\sum_{j_{1}\dots j_{M}}\sqrt{\sum_{k=2}^{K}\beta_{k,j_{1}\cdots j_{M}}^2}\right].$$ When response is multi-class, the group lasso penalty over categories is added to objective function through parameter lambda, and it reduces to a lasso penalty in binary problems.

The function catch will predict categorical response when testing data is provided. If testing data is not provided or if one wishes to perform prediction separately, catch can be used to only fit model with a catch object outcome. The object outcome can be combined with the adjusted tensor list from adjten to perform prediction by predict.catch.

References

Pan, Y., Mai, Q., and Zhang, X. (2018), "Covariate-Adjusted Tensor Classification in High-Dimensions." Journal of the American Statistical Association, accepted.

Examples

Run this code

# NOT RUN {
#without prediction
n <- 20
p <- 4
k <- 2
nvars <- p*p*p
x <- array(list(),n)
vec_x <- matrix(rnorm(n*nvars), nrow=n, ncol=nvars)
vec_x[1:10,] <- vec_x[1:10,]+2
z <- matrix(rnorm(n*2), nrow=n, ncol=2)
z[1:10,] <- z[1:10,]+0.5
y <- c(rep(1,10),rep(2,10))
for (i in 1:n){
  x[[i]] <- array(vec_x[i,],dim=c(p,p,p))
}
obj <- catch(x,z,y=y)
# }

Run the code above in your browser using DataLab