prmda: Robust PLS for binary classification

Description

Robust PLS and discriminant analysis for binary classification problems. This method for dimension reduction and discriminant analysis yields a classification model with a partial least squares alike interpretability that is robust to both vertical outliers and leverage points.

Usage

prmda(formula, data, a, fun = "Hampel", probp1 = 0.95, hampelp2 = 0.975, 
hampelp3 = 0.999, probp4 = 0.01, yweights = TRUE, 
class = c("regfit", "lda"), prior = c(0.5, 0.5), 
center = "median", scale = "qn", 
numit = 100, prec = 0.01)

Arguments

formula

a formula, e.g. group ~ X1 + X2 with group a factor with two levels or a numeric vector coding class membership with 1 and -1 and X1,X2 numeric variables.

data

a data frame or list which contains the variables given in formula. The response specified in the formula needs to be a numeric vector coding the class membership with 1 and-1 or a vector of factors with two levels.

the number of PRM components to be estimated in the model.

fun

an internal weighting function for case weights. Choices are "Hampel" (preferred), "Huber" or "Fair".

probp1

a quantile close to 1 at which to set the first outlier cutoff for the weighting function.

hampelp2

a quantile close to 1 with probp1fun="Hampel".

hampelp3

a quantile close to 1 with probp1fun="Hampel".

probp4

a quantile close to zero for the cutoff for potentially wrong class labels (see Reference). Ignorred if yweights=FALSE.

yweights

logical; if TRUE weights are calculated for observations with potentially wrong class labels.

class

type of classification; choices are "regfit" or "lda" (see Detail). If "regfit" an object of class prm is returned.

prior

vector of length 2 with prior probabilities of the groups; only used if class="lda".

center

type of centering of the data in form of a string that matches an R function, e.g. "mean" or "median".

scale

type of scaling for the data in form of a string that matches an R function, e.g. "sd" or "qn" or alternatively "no" for no scaling.

numit

the number of maximal iterations for the convergence of the case weights.

prec

a value for the precision of the convergence of the case weights.

Value

prmda returns an object of class prmda.
Functions summary, predict and biplot are available. Also the generic functions coefficients, fitted.values and residuals can be used to extract the corresponding elements from the sprmda object.
scoresthe matrix of scores.
RDirection vectors (or weighting vectors or rotation matrix) to obtain the scores. scores=Xs%*%R.
loadingsthe matrix of loadings.
wthe overall case weights used for robust dimenstion reduction and classification (depending on the weight function). w=sqrt(wy*wt).
wtthe group wise obtained case weights in the score space.
wythe case weights for potentially mislabeled observations.
Results from LDA model:
ldamodlist with robust pooled within-group covariance (cov) and the two robust group centers (m1, m2) in the score space
ldafitpostirior probabilities from robust LDA in the score space.
ldaclasspredicted class labels from robust LDA in the score space.
Results from the regression model with binary response:
coefficientsvector of coefficients of the weighted regression model.
interceptintercept of weighted regression model.
residualsvector of residuals, true response minus estimated response.
fitted.valuesthe vector of estimated response values.
coefficients.scaledvector of coefficients of the weighted regression model with scaled data.
intercept.scaledintercept of weighted regression model with scaled data.
Data preprocessing:
YMeansvalue used internally to center response.
XMeanvector used internally to center data.
Xscalesvector used internally to scale data.
Yscalesvalue used internally to scale response.
inputslist of inputs: parameters, data and scaled data.

Details

For class="lda" a robust LDA model is estimated in the PRM score space for class="regfit" the model ist a robust PLS regression model on the binary response.

References

Hoffmann, I., Filzmoser, P., Serneels, S., Varmuza, K., Sparse and robust PLS for binary classification.

Examples

Run this code

data(iris)
data <- droplevels(subset(iris,iris$Species!="setosa"))
mod <- prmda(Species~.,data, a=2, class="lda")

Run the code above in your browser using DataLab