SosDiscRobust: Robust and sparse multigroup classification by the optimal scoring approach

Description

Robust and sparse multigroup classification by the optimal scoring approach is robust against outliers, provides a low-dimensional and sparse representation of the predictors and is also applicable if the number of variables exeeds the number of observations.

Usage

SosDiscRobust(x, ...)
# S3 method for default
SosDiscRobust(x, grouping, prior=proportions, 
    lambda, Q=length(unique(grouping))-1, alpha=0.5, maxit=100, 
    tol = 1.0e-4, trace=FALSE, …)
# S3 method for formula
SosDiscRobust(formula, data = NULL, …, subset, na.action)

Arguments

formula

A formula of the form y~x, it describes the response and the predictors. The formula can be more complicated, such as y~log(x)+z etc (see formula for more details). The response should be a factor representing the response variable, or any vector that can be coerced to such (such as a logical variable).

data

An optional data frame (or similar: see model.frame) containing the variables in the formula formula.

subset

An optional vector used to select rows (observations) of the data matrix x.

na.action

A function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The default is na.omit.

A matrix or data frame containing the explanatory variables (training set); colnames of x have to be provided.

grouping

Grouping variable: a factor specifying the class for each observation.

prior

Prior probabilities, a vector of positive numbers that sum up to 1; default to the class proportions for the training set.

lambda

A non-negative tuning parameter for L1 norm penalty introducing sparsity on the optimal scoring coefficients $\boldsymbol{\beta}_h$ (see Details). If the number of variables exceeds the number of observations lambda has to be positive.

Number of optimal scoring coefficient vectors; Q has to be smaller than the number of groups. Defaults to number of groups - 1.

alpha

Robustness parameter used in sparseLTS (for initial estimation, see Details). Default alpha=0.5.

maxit

Number of iterations for the estimation of optimal scoring coefficients and case weights. Default maxit=100.

tol

Tolerance for convergence of the normed weighted change in the residual sum of squares for the estiamtion of optimal scoring coefficeints. Default is tol=1.0e-4.

trace

Whether to print intermediate results. Default is trace = FALSE.

…

Arguments passed to or from other methods.

Value

An S4 object of class SosDiscRobust-class which is a subclass of of the virtual class SosDisc-class.

Details

The sparse optimal scoring problem (Clemmensen et al, 2011): for $h=1,....,Q$ $$ \min_{\beta_h,\theta_h} \frac{1}{n} \|Y \theta_h - X \beta_h \|_2^2 + \lambda \|\beta_h\|_1 $$ subject to $$ \frac{1}{n} \theta_h^T Y^T Y\theta_h=1, \quad \theta_h^T Y^T Y \theta_l=0 \quad \forall l<h, $$

where $X$ deontes the robustly centered and scaled input matrix x (or alternativly the predictors from formular) and $Y$ is an dummy matrix coding die classmemberships from grouping.

For each $h$ this problem can be solved interatively for $\beta_h$ and $\theta_h$. In order to obtain robust estimates, $\beta_h$ is estimated with reweighted sparse least trimmed squares regression (Alfons et al, 2013) and $\theta_h$ with least absolut deviation regression in the first two iterations. To speed up the following repetitions an iterative down-weighting of observations with large residuals is combined with the iterative estimation of the optimal scoring coefficients with their classical estimates.

The classification model is estimated on the low dimensional sparse subspace $X[\beta_1,...,\beta_Q]$ with robust LDA (Linda).

References

Clemmensen L, Hastie T, Witten D & Ersboll B (2011), Sparse discriminant analysis. Technometrics, 53(4), 406--413.

Alfons A, Croux C & Gelper S (2013), Sparse least trimmed squares regression for analysing high-dimensional large data sets. The Annals of Applied Statistics, 7(1), 226--248.

Hoffmann I, Filzmoser P & Croux C (2016), Robust and sparse multigroup classification by the optimal scoring approach. Submitted for publication.

Examples

Run this code

# NOT RUN {
## EXAMPLE 1 ######################################
data(olitos)
grind <- which(colnames(olitos)=="grp")

set.seed(5008642)
mod <- SosDiscRobust(grp~., data=olitos, lambda=0.3, maxIte=30, Q=3, tol=1e-2)

pred <- predict(mod, newdata=olitos[,-grind])

summary(mod)
plot(mod, ind=c(1:3))


## EXAMPLE 2 ######################################
##

# }
# NOT RUN {
library(sparseLDA)
data(penicilliumYES)

## for demonstration only:
set.seed(5008642)
X <- penicilliumYES$X[, sample(1:ncol(penicilliumYES$X), 100)]

## takes a subsample of the variables
## to have quicker computation time

colnames(X) <- paste0("V",1:ncol(X))
y <- as.factor(c(rep(1,12), rep(2,12), rep(3,12)))

set.seed(5008642)
mod <- SosDiscRobust(X, y, lambda=1, maxit=5, Q=2, tol=1e-2)

summary(mod)
plot(mod)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab