lqs
Fit a regression to the good points in the dataset, thereby
achieving a regression estimator with a high breakdown point.
lmsreg
and ltsreg
are compatibility wrappers.
Usage
lqs(x, …)# S3 method for formula
lqs(formula, data, …,
method = c("lts", "lqs", "lms", "S", "model.frame"),
subset, na.action, model = TRUE,
x.ret = FALSE, y.ret = FALSE, contrasts = NULL)
# S3 method for default
lqs(x, y, intercept = TRUE, method = c("lts", "lqs", "lms", "S"),
quantile, control = lqs.control(…), k0 = 1.548, seed, …)
lmsreg(…)
ltsreg(…)
Arguments
 formula
 a formula of the form
y ~ x1 + x2 + …
.  data
 data frame from which variables specified in
formula
are preferentially to be taken.  subset
 an index vector specifying the cases to be used in fitting. (NOTE: If given, this argument must be named exactly.)
 na.action
 function to specify the action to be taken if
NA
s are found. The default action is for the procedure to fail. Alternatives includena.omit
andna.exclude
, which lead to omission of cases with missing values on any required variable. (NOTE: If given, this argument must be named exactly.)  model, x.ret, y.ret
 logical. If
TRUE
the model frame, the model matrix and the response are returned, respectively.  contrasts
 an optional list. See the
contrasts.arg
ofmodel.matrix.default
.  x
 a matrix or data frame containing the explanatory variables.
 y
 the response: a vector of length the number of rows of
x
.  intercept
 should the model include an intercept?
 method

the method to be used.
model.frame
returns the model frame: for the others see theDetails
section. Usinglmsreg
orltsreg
forces"lms"
and"lts"
respectively.  quantile

the quantile to be used: see
Details
. This is overridden ifmethod = "lms"
.  control
 additional control items: see
Details
.  k0
 the cutoff / tuning constant used for \(\chi()\)
and \(\psi()\) functions when
method = "S"
, currently corresponding to Tukey's ‘biweight’.  seed

the seed to be used for random sampling: see
.Random.seed
. The current value of.Random.seed
will be preserved if it is set..  …
 arguments to be passed to
lqs.default
orlqs.control
, seecontrol
above andDetails
.
Details
Suppose there are n
data points and p
regressors,
including any intercept. The first three methods minimize some function of the sorted squared
residuals. For methods "lqs"
and "lms"
is the
quantile
squared residual, and for "lts"
it is the sum
of the quantile
smallest squared residuals. "lqs"
and
"lms"
differ in the defaults for quantile
, which are
floor((n+p+1)/2)
and floor((n+1)/2)
respectively.
For "lts"
the default is floor(n/2) + floor((p+1)/2)
. The "S"
estimation method solves for the scale s
such that the average of a function chi of the residuals divided
by s
is equal to a given constant. The control
argument is a list with components
psamp
: the size of each sample. Defaults to
p
. nsamp
: the number of samples or
"best"
(the default) or"exact"
or"sample"
. If"sample"
the number chosen ismin(5*p, 3000)
, taken from Rousseeuw and Hubert (1997). If"best"
exhaustive enumeration is done up to 5000 samples; if"exact"
exhaustive enumeration will be attempted however many samples are needed. adjust
: should the intercept be optimized for each
sample? Defaults to
TRUE
.
Value
An object of class "lqs"
. This is a list with components
method == "S"
before IWLS refinement.method ==
"S"
) is based on the variance of those residuals whose absolute
value is less than 2.5 times the initial estimate.Note
There seems no reason other than historical to use the lms
and
lqs
options. LMS estimation is of low efficiency (converging
at rate \(n^{1/3}\)) whereas LTS has the same asymptotic efficiency
as an M estimator with trimming at the quartiles (Marazzi, 1993, p.201).
LQS and LTS have the same maximal breakdown value of
(floor((np)/2) + 1)/n
attained if
floor((n+p)/2) <= quantile <= floor((n+p+1)/2)
.
The only drawback mentioned of LTS is greater computation, as a sort
was thought to be required (Marazzi, 1993, p.201) but this is not
true as a partial sort can be used (and is used in this implementation). Adjusting the intercept for each trial fit does need the residuals to
be sorted, and may be significant extra computation if n
is large
and p
small. Opinions differ over the choice of psamp
. Rousseeuw and Hubert
(1997) only consider p; Marazzi (1993) recommends p+1 and suggests
that more samples are better than adjustment for a given computational
limit. The computations are exact for a model with just an intercept and
adjustment, and for LQS for a model with an intercept plus one
regressor and exhaustive search with adjustment. For all other cases
the minimization is only known to be approximate.
References
P. J. Rousseeuw and A. M. Leroy (1987) Robust Regression and Outlier Detection. Wiley. A. Marazzi (1993) Algorithms, Routines and S Functions for Robust Statistics. Wadsworth and Brooks/Cole. P. Rousseeuw and M. Hubert (1997) Recent developments in PROGRESS. In L1Statistical Procedures and Related Topics, ed Y. Dodge, IMS Lecture Notes volume 31, pp. 201214.
See Also
Examples
library(MASS)
set.seed(123) # make reproducible
lqs(stack.loss ~ ., data = stackloss)
lqs(stack.loss ~ ., data = stackloss, method = "S", nsamp = "exact")