Learn R Programming

dual.spls (version 0.1.4)

d.spls.ridge: Dual Sparse Partial Least Squares (Dual-SPLS) regression for the ridge norm

Description

The function d.spls.lasso performs dimensional reduction as in PLS methodology combined to variable selection via the Dual-SPLS algorithm with the norm $$\Omega(w)=\lambda_1 \|w\|_1 +\lambda_2 \|Xw\|_2 + \|w\|_2.$$ In the algorithm, the parameters \(\lambda\), \(\lambda_1\) and \(\lambda_2\)are transformed into more meaningful values, ppnu and \(\nu_2\).

Usage

d.spls.ridge(X,y,ncp,ppnu,nu2,verbose=TRUE)

Value

A list of the following attributes

Xmean

the mean vector of the predictors matrix X.

scores

the matrix of dimension (n,ncp) where n is the number of observations. The scores represents the observations in the new component basis computed by the compression step of the Dual-SPLS.

loadings

the matrix of dimension (p,ncp) that represents the Dual-SPLS components.

Bhat

the matrix of dimension (p,ncp) that regroups the regression coefficients for each component.

intercept

the vector of intercept values for each component.

fitted.values

the matrix of dimension (n,ncp) that represents the predicted values of y

residuals

the matrix of dimension (n,ncp) that represents the residuals corresponding to the difference between the responses and the fitted values.

lambda1

the vector of length ncp collecting the parameters of sparsity used to fit the model at each iteration.

zerovar

the vector of length ncp representing the number of variables shrank to zero per component.

ind_diff0

the list of ncp elements representing the index of the none null regression coefficients elements.

type

a character specifying the Dual-SPLS norm used. In this case it is ridge.

Arguments

X

a numeric matrix of predictors values of dimension (n,p). Each row represents one observation and each column one predictor variable.

y

a numeric vector or a one column matrix of responses. It represents the response variable for each observation.

ncp

a positive integer. ncp is the number of Dual-SPLS components.

ppnu

a positive real value, in \([0,1]\). ppnu is the desired proportion of variables to shrink to zero for each component (see Dual-SPLS methodology).

nu2

a positive real value. nu2 is a regularization parameter on \(X^TX\).

verbose

a Boolean value indicating whether or not to display the iterations steps. Default value is TRUE.

Author

Louna Alsouki François Wahl

Details

The resulting solution for \(w\) and hence for the coefficients vector, in the case of d.spls.ridge, has a simple closed form expression (ref) deriving from the fact that \(w\) is collinear to a vector \(z_{\nu_1}\) of coordinates $$z_{\nu_1,j}=\textrm{sign}(z_{X,\nu_2,j})(|z_{X,\nu_2,j}|-\nu_1)_+.$$ Here \(\nu_1\) is the threshold for which ppnu of the absolute values of the coordinates of \(z_{X,\nu_2}\) are greater than \(\nu_1\) and \(z_{X,\nu_2}=(\nu_2 X^TX + I_p)^{-1}X^Ty\). Therefore, the ridge norm is beneficial to the situation where \(X^TX\) is singular. If \(X^TX\) is invertible, one can choose to use the Dual-SPLS for the least squares norm instead.

See Also

d.spls.LS

Examples

Run this code
### load dual.spls library
library(dual.spls)
### parameters
oldpar <- par(no.readonly = TRUE)
n <- 200
p <- 100
nondes <- 150
sigmaondes <- 0.01
data=d.spls.simulate(n=n,p=p,nondes=nondes,sigmaondes=sigmaondes)

X <- data$X
y <- data$y


#fitting the model
mod.dspls <- d.spls.ridge(X=X,y=y,ncp=10,ppnu=0.9,nu2=100,verbose=TRUE)

str(mod.dspls)

### plotting the observed values VS predicted values
plot(y,mod.dspls$fitted.values[,6], xlab="Observed values", ylab="Predicted values",
main="Observed VS Predicted for 6 components")
points(-1000:1000,-1000:1000,type='l')

### plotting the regression coefficients
par(mfrow=c(3,1))
i=6
nz=mod.dspls$zerovar[i]
plot(1:dim(X)[2],mod.dspls$Bhat[,i],type='l',
    main=paste(" Dual-SPLS (ridge), ncp =", i, " #0coef =", nz, "/", dim(X)[2]),
    ylab='',xlab='' )
inonz=which(mod.dspls$Bhat[,i]!=0)
points(inonz,mod.dspls$Bhat[inonz,i],col='red',pch=19,cex=0.5)
legend("topright", legend ="non null values", bty = "n", cex = 0.8, col = "red",pch=19)
par(oldpar)

Run the code above in your browser using DataLab