Estimate the prediction error of a fitted model via
(repeated) \(K\)-fold cross-validation, (repeated)
random splitting (also known as random subsampling or
Monte Carlo cross-validation), or the bootstrap. Methods
are available for least squares fits computed with
lm
as well as for the following
robust alternatives: MM-type models computed with
lmrob
and least trimmed squares
fits computed with ltsReg
.
perry(object, ...) # S3 method for lm
perry (object, splits = foldControl(),
cost = rmspe, ncores = 1, cl = NULL, seed = NULL, ...)
# S3 method for lmrob
perry (object, splits = foldControl(),
cost = rtmspe, ncores = 1, cl = NULL, seed = NULL, ...)
# S3 method for lts
perry (object, splits = foldControl(),
fit = c("reweighted", "raw", "both"), cost = rtmspe,
ncores = 1, cl = NULL, seed = NULL, ...)
the fitted model for which to estimate the prediction error.
an object of class "cvFolds"
(as
returned by cvFolds
) or a control object of
class "foldControl"
(see
foldControl
) defining the folds of the data
for (repeated) \(K\)-fold cross-validation, an object
of class "randomSplits"
(as returned by
randomSplits
) or a control object of class
"splitControl"
(see splitControl
)
defining random data splits, or an object of class
"bootSamples"
(as returned by
bootSamples
) or a control object of class
"bootControl"
(see bootControl
)
defining bootstrap samples.
a character string specifying for which fit to
estimate the prediction error. Possible values are
"reweighted"
(the default) for the prediction
error of the reweighted fit, "raw"
for the
prediction error of the raw fit, or "both"
for the
prediction error of both fits.
a cost function measuring prediction loss.
It should expect the observed values of the response to
be passed as the first argument and the predicted values
as the second argument, and must return either a
non-negative scalar value, or a list with the first
component containing the prediction error and the second
component containing the standard error. The default is
to use the root mean squared prediction error for the
"lm"
method and the root trimmed mean squared
prediction error for the "lmrob"
and "lts"
methods (see cost
).
a positive integer giving the number of
processor cores to be used for parallel computing (the
default is 1 for no parallelization). If this is set to
NA
, all available processor cores are used.
a parallel cluster for parallel computing
as generated by makeCluster
. If
supplied, this is preferred over ncores
.
optional initial seed for the random number
generator (see .Random.seed
). Note that
also in case of parallel computing, resampling is
performed on the manager process rather than the worker
processes. On the parallel worker processes, random
number streams are used and the seed is set via
clusterSetRNGStream
.
for the generic function, additional
arguments to be passed down to methods. For the methods,
additional arguments to be passed to the prediction loss
function cost
.
An object of class "perry"
with the following
components:
a numeric vector containing the estimated
prediction errors. For the "lm"
and
"lmrob"
methods, this is a single numeric value.
For the "lts"
method, this contains one value for
each of the requested fits. In case of more than one
replication, those are average values over all
replications.
a numeric vector containing the estimated
standard errors of the prediction loss. For the
"lm"
and "lmrob"
methods, this is a single
numeric value. For the "lts"
method, this
contains one value for each of the requested fits.
a numeric matrix containing the estimated
prediction errors from all replications. For the
"lm"
and "lmrob"
methods, this is a matrix
with one column. For the "lts"
method, this
contains one column for each of the requested fits.
However, this is only returned in case of more than one
replication.
an object giving the data splits used to estimate the prediction error.
the response.
a list containing the predicted values from all replications.
the matched function call.
# NOT RUN {
## load data and fit an LS regression model
data("mtcars")
fit <- lm(mpg ~ wt + cyl, data=mtcars)
## perform cross-validation
# K-fold CV
perry(fit, foldControl(K = 5, R = 10), seed = 1234)
# leave-one-out CV
perry(fit, foldControl(K = nrow(mtcars)))
## perform random splitting
perry(fit, splitControl(m = 6, R = 10), seed = 1234)
## perform bootstrap prediction error estimation
# 0.632 estimator
perry(fit, bootControl(R = 10, type = "0.632"), seed = 1234)
# out-of-bag estimator
perry(fit, bootControl(R = 10, type = "out-of-bag"), seed = 1234)
# }
Run the code above in your browser using DataLab