sprmda: Sparse and robust PLS for binary classification

Description

This method for dimension reduction and discriminant analysis yields a sparse classification model with a partial least squares alike interpretability that is robust to both vertical outliers and leverage points.

Usage

sprmda(formula, data, a, eta, fun = "Hampel", probp1 = 0.95, hampelp2 = 0.975, 
hampelp3 = 0.999, probp4=0.01, yweights = TRUE, 
class = c("regfit", "lda"), prior = c(0.5, 0.5), center = "median", scale = "qn", 
print = FALSE, numit = 100, prec = 0.01)

Arguments

formula

a formula, e.g. group ~ X1 + X2 with group a factor with two levels or a numeric vector coding class membership with 1 and -1 and X1,X2 numeric variables.

data

a data frame or list which contains the variables given in formula. The response specified in the formula needs to be a numeric vector coding the class membership with 1 and-1 or a vector of factors with two levels.

the number of SPRM components to be estimated in the model.

eta

a tuning parameter for the sparsity with 0\le eta

fun

an internal weighting function for case weights. Choices are "Hampel" (preferred), "Huber" or "Fair".

probp1

the 1-alpha value at which to set the first outlier cutoff for the weighting function.

hampelp2

the 1-alpha values for second cutoff. Only applies to fun="Hampel".

hampelp3

the 1-alpha values for third cutoff. Only applies to fun="Hampel".

probp4

a quantile close to zero for the cutoff for potentially wrong class labels (see Reference). Ignorred if yweights=FALSE.

yweights

logical; if TRUE weights are calculated for observations with potentially wrong class labels.

class

type of classification; choices are "regfit" or "lda". If "regfit" an object of class prm is returned.

prior

vector of length 2 with proir probabilities of the groups; only used if class="lda".

center

type of centering of the data in form of a string that matches an R function, e.g. "mean" or "median".

scale

type of scaling for the data in form of a string that matches an R function, e.g. "sd" or "qn" or alternatively "no" for no scaling.

logical, default is FALSE. If TRUE the variables included in each component are reported.

numit

the number of maximal iterations for the convergence of the coefficient estimates.

prec

a value for the precision of estimation of the coefficients.

Value

sprmda returns an object of class sprmda.
Functions summary, predict and biplot are available. Also the generic functions coefficients, fitted.values and residuals can be used to extract the corresponding elements from the sprmda object.
scoresthe matrix of scores.
RDirection vectors (or weighting vectors or rotation matrix) to obtain the scores. scores=Xs%*%R.
loadingsthe matrix of loadings.
wthe overall case weights used for robust dimenstion reduction and classification (depending on the weight function). w=sqrt(wy*wt).
wtthe group wise obtained case weights in the score space.
wythe case weights for potentially mislabeled observations.
used.varsIndices of variables included in the model.
Yvarpercentage of contribution for each component to the explanation of the variance of the response.
Xvarpercentage of contribution for each component to the explanation of the variance of the variables.
Results from LDA model:
ldamodlist with robust pooled within-group covariance (cov) and the two robust group centers (m1, m2) in the score space
ldafitpostirior probabilities from robust LDA in the score space.
ldaclasspredicted class labels from robust LDA in the score space.
Results from the regression model with binary response:
coefficientsvector of coefficients of the weighted regression model.
interceptintercept of weighted regression model.
residualsvector of residuals, true response minus estimated response.
fitted.valuesthe vector of estimated response values.
coefficients.scaledvector of coefficients of the weighted regression model with scaled data.
intercept.scaledintercept of weighted regression model with scaled data.
Data preprocessing:
YMeansvalue used internally to center response.
XMeanvector used internally to center data.
Xscalesvector used internally to scale data.
Yscalesvalue used internally to scale response.
inputslist of inputs: parameters, data and scaled data.

Details

For class="lda" a robust LDA model is estimated in the SPRM score space for class="regfit" the model ist a robust sparse PLS regression model on the binary response.

References

Hoffmann, I., Filzmoser, P., Serneels, S., Varmuza, K., Sparse and robust PLS for binary classification.

Examples

Run this code

data(iris)
data <- droplevels(subset(iris,iris$Species!="setosa"))
smod <- sprmda(Species~.,data, a=2, eta=0.7, class="lda")

Run the code above in your browser using DataLab