mrpls: Ridge Partial Least Square for categorical data

Description

The function mrpls performs prediction using Fort et al. (2005) MRPLS algorithm.

Usage

mrpls(Ytrain,Xtrain,Lambda,ncomp,Xtest=NULL,NbIterMax=50)

Arguments

Xtrain

a (ntrain x p) data matrix of predictors. Xtrain must be a matrix. Each row corresponds to an observation and each column to a predictor variable.

Ytrain

a ntrain vector of responses. Ytrain must be a vector. Ytrain is a {0,...,c}-valued vector and contains the response variable for each observation. c+1 is the number of classes.

Xtest

a (ntest x p) matrix containing the predictors for the test data set. Xtest may also be a vector of length p (corresponding to only one test observation).If Xtest is not equal to NULL, then the prediction step is made for these new predictor variables.

Lambda

a positive real value. Lambda is the ridge regularization parameter.

ncomp

a positive integer. ncomp is the number of PLS components. If ncomp=0,then the Ridge regression is performed without reduction dimension.

NbIterMax

a positive integer. NbIterMax is the maximal number of iterations in the Newton-Rapson parts.

Value

A list with the following components:

Coefficients

the (p+1) x c matrix containing the coefficients weighting the block design matrix.

hatY

the ntrain vector containing the estimated {0,...,c}-valued labels for the observations from Xtrain.

hatYtest

the ntest vector containing the predicted {0,...,c}-valued labels for the observations from Xtest.

proba

the ntrain vector containing the estimated probabilities for the observations from Xtrain.

proba.test

the ntest vector containing the predicted probabilities for the observations from Xtest.

DeletedCol

the vector containing the column number of Xtrain when the variance of the corresponding predictor variable is null. Otherwise DeletedCol=NULL

hatYtest_k

If ncomp is greater than 1, hatYtest_k is a matrix of size ntest x ncomp in such a way that the kth column corresponds to the predicted label obtained with k PLS components.

Details

The columns of the data matrices Xtrain and Xtest may not be standardized, since standardizing is performed by the function mrpls as a preliminary step before the algorithm is run.

The procedure described in Fort et al. (2005) is used to determine latent components to be used for classification and when Xtest is not equal to NULL, the procedure predicts the labels for these new predictor variables.

References

G. Fort, S. Lambert-Lacroix and Julie Peyre (2005). Reduction de dimension dans les modeles lineaires generalises : application a la classification supervisee de donnees issues des biopuces. Journal de la SFDS, tome 146, n1-2, 117-152.

Examples

Run this code

# NOT RUN {
# load plsgenomics library
library(plsgenomics)

# load SRBCT data
data(SRBCT)
IndexLearn <- c(sample(which(SRBCT$Y==1),10),sample(which(SRBCT$Y==2),4),
			sample(which(SRBCT$Y==3),7),sample(which(SRBCT$Y==4),9))

# perform prediction by MRPLS
res <- mrpls(Ytrain=SRBCT$Y[IndexLearn]-1,Xtrain=SRBCT$X[IndexLearn,],Lambda=0.001,ncomp=2,
			Xtest=SRBCT$X[-IndexLearn,])
sum(res$Ytest!=SRBCT$Y[-IndexLearn]-1)

# prediction for another sample
Xnew <- SRBCT$X[83,]
# Compute the linear predictor for each classes expect class 1
eta <- diag(t(cbind(c(1,Xnew),c(1,Xnew),c(1,Xnew))) %*% res$Coefficients)
Ypred <- which.max(c(0,eta))
Ypred+1
SRBCT$Y[83]

# }

Run the code above in your browser using DataLab

Data engineering and BI courses are free!