sparseLTS: Sparse least trimmed squares regression

Description

Compute least trimmed squares regression with an $L_{1}$ penalty on the regression coefficients, which allows for sparse model estimates.

Usage

sparseLTS(x, ...)

  ## S3 method for class 'formula':
sparseLTS(formula, data, ...)

  ## S3 method for class 'default':
sparseLTS(x, y, lambda,
    mode = c("lambda", "fraction"), alpha = 0.75,
    intercept = TRUE, nsamp = c(500, 10),
    initial = c("sparse", "hyperplane", "random"),
    ncstep = 2, use.correction = TRUE,
    tol = .Machine$double.eps^0.5,
    eps = .Machine$double.eps, use.Gram,
    crit = c("BIC", "PE"), splits = foldControl(),
    cost = rtmspe, costArgs = list(),
    selectBest = c("hastie", "min"), seFactor = 1,
    ncores = 1, cl = NULL, seed = NULL, model = TRUE, ...)

Arguments

formula

a formula describing the model.

data

an optional data frame, list or environment (or object coercible to a data frame by as.data.frame) containing the variables in the model. If not found in data, the variables are taken fro

a numeric matrix containing the predictor variables.

a numeric vector containing the response variable.

lambda

a numeric vector of non-negative values to be used as penalty parameter.

mode

a character string specifying the type of penalty parameter. If "lambda", lambda gives the grid of values for the penalty parameter directly. If "fraction", the smallest value of the penalty parameter t

alpha

a numeric value giving the percentage of the residuals for which the $L_{1}$ penalized sum of squares should be minimized (the default is 0.75).

intercept

a logical indicating whether a constant term should be included in the model (the default is TRUE).

nsamp

a numeric vector giving the number of subsamples to be used in the two phases of the algorithm. The first element gives the number of initial subsamples to be used. The second element gives the number of subsamples to keep after the first pha

initial

a character string specifying the type of initial subsamples to be used. If "sparse", the lasso fit given by three randomly selected data points is first computed. The corresponding initial subsample is then formed by the fracti

ncstep

a positive integer giving the number of C-steps to perform on all subsamples in the first phase of the algorithm (the default is to perform two C-steps).

use.correction

currently ignored. Small sample correction factors may be added in the future.

tol

a small positive numeric value giving the tolerance for convergence.

eps

a small positive numeric value used to determine whether the variability within a variable is too small (an effective zero).

use.Gram

a logical indicating whether the Gram matrix of the explanatory variables should be precomputed in the lasso fits on the subsamples. If the number of variables is large, computation may be faster when this is set to FALSE. The d

crit

a character string specifying the optimality criterion to be used for selecting the final model. Possible values are "BIC" for the Bayes information criterion and "PE" for resampling-based prediction error estimation.

splits

an object giving data splits to be used for prediction error estimation (see perryTuning).

cost

a cost function measuring prediction loss (see perryTuning for some requirements). The default is to use the root trimmed mean squared prediction error (see

costArgs

a list of additional arguments to be passed to the prediction loss function cost.

selectBest,seFactor

arguments specifying a criterion for selecting the best model (see perryTuning). The default is to use a one-standard-error rule.

ncores

a positive integer giving the number of processor cores to be used for parallel computing (the default is 1 for no parallelization). If this is set to NA, all available processor cores are used. For prediction error estimation,

a parallel cluster for parallel computing as generated by makeCluster. This is preferred over ncores for prediction error estimation, in which case ncores<

seed

optional initial seed for the random number generator (see .Random.seed). On parallel Rworker processes for prediction error estimation, random number streams are used and the seed is set via

model

a logical indicating whether the data x and y should be added to the return object. If intercept is TRUE, a column of ones is added to x to account for the intercept.

...

additional arguments to be passed down.

Value

If crit is "PE", an object of class "perrySparseLTS" (inheriting from class "perryTuning", see perryTuning). It contains information on the prediction error criterion, and includes the final model with the optimal tuning paramter as component finalModel. Otherwise an object of class "sparseLTS" with the following components:
lambdaa numeric vector giving the values of the penalty parameter.
bestan integer vector or matrix containing the respective best subsets of $h$ observations found and used for computing the raw estimates.
objectivea numeric vector giving the respective values of the sparse LTS objective function, i.e., the $L_{1}$ penalized sums of the $h$ smallest squared residuals from the raw fits.
coefficientsa numeric vector or matrix containing the respective coefficient estimates from the reweighted fits.
fitted.valuesa numeric vector or matrix containing the respective fitted values of the response from the reweighted fits.
residualsa numeric vector or matrix containing the respective residuals from the reweighted fits.
centera numeric vector giving the robust center estimates of the corresponding reweighted residuals.
scalea numeric vector giving the robust scale estimates of the corresponding reweighted residuals.
cnp2a numeric vector giving the respective consistency factors applied to the scale estimates of the reweighted residuals.
wtan integer vector or matrix containing binary weights that indicate outliers from the respective reweighted fits, i.e., the weights are $1$ for observations with reasonably small reweighted residuals and $0$ for observations with large reweighted residuals.
dfan integer vector giving the respective degrees of freedom of the obtained reweighted model fits, i.e., the number of nonzero coefficient estimates.
intercepta logical indicating whether the model includes a constant term.
alphaa numeric value giving the percentage of the residuals for which the $L_{1}$ penalized sum of squares was minimized.
quanthe number $h$ of observations used to compute the raw estimates.
raw.coefficientsa numeric vector or matrix containing the respective coefficient estimates from the raw fits.
raw.fitted.valuesa numeric vector or matrix containing the respective fitted values of the response from the raw fits.
raw.residualsa numeric vector or matrix containing the respective residuals from the raw fits.
raw.centera numeric vector giving the robust center estimates of the corresponding raw residuals.
raw.scalea numeric vector giving the robust scale estimates of the corresponding raw residuals.
raw.cnp2a numeric value giving the consistency factor applied to the scale estimate of the raw residuals.
raw.wtan integer vector or matrix containing binary weights that indicate outliers from the respective raw fits, i.e., the weights used for the reweighted fits.
critan object of class "bicSelect" containing the BIC values and indicating the final model (only returned if argument crit is "BIC" and argument lambda contains more than one value for the penalty parameter).
xthe predictor matrix (if model is TRUE).
ythe response variable (if model is TRUE).
callthe matched function call. Package robustHD has a built-in back end for sparse least trimmed squares using the C++ library Armadillo. Another back end is available through package sparseLTSEigen, which uses the C++ library Eigen. The latter is faster, currently does not work on 32-bit Rfor Windows. For both C++ back ends, parallel computing is implemented via OpenMP (http://openmp.org/).

Examples

Run this code

## generate data
# example is not high-dimensional to keep computation time low
library("mvtnorm")
set.seed(1234)  # for reproducibility
n <- 100  # number of observations
p <- 25   # number of variables
beta <- rep.int(c(1, 0), c(5, p-5))  # coefficients
sigma <- 0.5      # controls signal-to-noise ratio
epsilon <- 0.1    # contamination level
Sigma <- 0.5^t(sapply(1:p, function(i, j) abs(i-j), 1:p))
x <- rmvnorm(n, sigma=Sigma)    # predictor matrix
e <- rnorm(n)                   # error terms
i <- 1:ceiling(epsilon*n)       # observations to be contaminated
e[i] <- e[i] + 5                # vertical outliers
y <- c(x %*% beta + sigma * e)  # response
x[i,] <- x[i,] + 5              # bad leverage points

## fit sparse LTS model for one value of lambda
sparseLTS(x, y, lambda = 0.05, mode = "fraction")

## fit sparse LTS models over a grid of values for lambda
frac <- seq(0.2, 0.05, by = -0.05)
sparseLTS(x, y, lambda = frac, mode = "fraction")

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples