Produce diagnostic plots for a sequence of regression models, such as submodels along a robust least angle regression sequence, or sparse least trimmed squares regression models for a grid of values for the penalty parameter. Four plots are currently implemented.
diagnosticPlot(x, ...)# S3 method for seqModel
diagnosticPlot(x, s = NA, covArgs = list(), ...)
# S3 method for perrySeqModel
diagnosticPlot(x, ...)
# S3 method for tslars
diagnosticPlot(x, p, ...)
# S3 method for sparseLTS
diagnosticPlot(x, s = NA, fit = c("reweighted", "raw",
"both"), covArgs = list(), ...)
# S3 method for perrySparseLTS
diagnosticPlot(x, ...)
# S3 method for default
diagnosticPlot(x, which = c("all", "rqq", "rindex", "rfit",
"rdiag"), ask = (which == "all"), facets = attr(x, "facets"),
size = c(2, 4), id.n = NULL, ...)
the model fit for which to produce diagnostic plots, or a data
frame containing all necessary information for plotting (as generated by the
corresponding fortify
method).
for the "seqModel"
method, an integer vector giving
the steps of the submodels for which to produce diagnostic plots (the
default is to use the optimal submodel). For the "sparseLTS"
method,
an integer vector giving the indices of the models for which to produce
diagnostic plots (the default is to use the optimal model for each of the
requested fits).
a list of arguments to be passed to
covMcd
for the regression diagnostic plot (see
“Details”).
an integer giving the lag length for which to produce the plot (the default is to use the optimal lag length).
a character string specifying for which fit to produce
diagnostic plots. Possible values are "reweighted"
(the default) for
diagnostic plots for the reweighted fit, "raw"
for diagnostic plots
for the raw fit, or "both"
for diagnostic plots for both fits.
a character string indicating which plot to show. Possible
values are "all"
(the default) for all of the following, "rqq"
for a normal Q-Q plot of the standardized residuals, "rindex"
for a
plot of the standardized residuals versus their index, "rfit"
for a
plot of the standardized residuals versus the fitted values, or
"rdiag"
for a regression diagnostic plot (standardized residuals
versus robust Mahalanobis distances of the predictor variables).
a logical indicating whether the user should be asked before
each plot (see devAskNewPage
). The default is to
ask if all plots are requested and not ask otherwise.
a faceting formula to override the default behavior. If
supplied, facet_wrap
or
facet_grid
is called depending on whether the formula
is one-sided or two-sided.
a numeric vector of length two giving the point and label size, respectively.
an integer giving the number of the most extreme observations to be identified by a label. The default is to use the number of identified outliers, which can be different for the different plots. See “Details” for more information.
for the generic function, additional arguments to be passed
down to methods. For the "tslars"
method, additional arguments to be
passed down to the "seqModel"
method. For the "perrySeqModel"
and "perrySparseLTS"
method, additional arguments to be passed down
to the "seqModel"
and "sparseLTS"
method, respectively. For
the "seqModel"
and "sparseLTS"
methods, additional arguments
to be passed down to the default method. For the default method, additional
arguments to be passed down to geom_point
.
If only one plot is requested, an object of class "ggplot"
(see
ggplot
), otherwise a list of such objects.
In the normal Q-Q plot of the standardized residuals, a reference line is
drawn through the first and third quartile. The id.n
observations
with the largest distances from that line are identified by a label (the
observation number). The default for id.n
is the number of
regression outliers, i.e., the number of observations whose residuals are
too large (cf. wt
).
In the plots of the standardized residuals versus their index or the fitted
values, horizontal reference lines are drawn at 0 and +/-2.5. The
id.n
observations with the largest absolute values of the
standardized residuals are identified by a label (the observation
number). The default for id.n
is the number of regression outliers,
i.e., the number of observations whose absolute residuals are too large (cf.
wt
).
For the regression diagnostic plot, the robust Mahalanobis distances of the
predictor variables are computed via the MCD based on only those predictors
with non-zero coefficients (see
covMcd
). Horizontal reference lines are drawn at
+/-2.5 and a vertical reference line is drawn at the upper 97.5% quantile
of the id.n
observations with the largest absolute values
of the standardized residuals and/or largest robust Mahalanobis distances
are identified by a label (the observation number). The default for
id.n
is the number of all outliers: regression outliers (i.e.,
observations whose absolute residuals are too large, cf. wt
)
and leverage points (i.e., observations with robust Mahalanobis distance
larger than the 97.5% quantile of the
ggplot
, rlars
,
grplars
, rgrplars
, tslarsP
,
rtslarsP
, tslars
, rtslars
,
sparseLTS
, plot.lts
# NOT RUN {
## generate data
# example is not high-dimensional to keep computation time low
library("mvtnorm")
set.seed(1234) # for reproducibility
n <- 100 # number of observations
p <- 25 # number of variables
beta <- rep.int(c(1, 0), c(5, p-5)) # coefficients
sigma <- 0.5 # controls signal-to-noise ratio
epsilon <- 0.1 # contamination level
Sigma <- 0.5^t(sapply(1:p, function(i, j) abs(i-j), 1:p))
x <- rmvnorm(n, sigma=Sigma) # predictor matrix
e <- rnorm(n) # error terms
i <- 1:ceiling(epsilon*n) # observations to be contaminated
e[i] <- e[i] + 5 # vertical outliers
y <- c(x %*% beta + sigma * e) # response
x[i,] <- x[i,] + 5 # bad leverage points
## robust LARS
# fit model
fitRlars <- rlars(x, y, sMax = 10)
# create plot
diagnosticPlot(fitRlars)
## sparse LTS
# fit model
fitSparseLTS <- sparseLTS(x, y, lambda = 0.05, mode = "fraction")
# create plot
diagnosticPlot(fitSparseLTS)
diagnosticPlot(fitSparseLTS, fit = "both")
# }
Run the code above in your browser using DataLab