These functions are primarily useful for writing methods for the
cv() generic function. They are used internally in the package
and can also be used for extensions (see the vignette "Extending the cv package,
vignette("cv-extend", package="cv")).
cvCompute(
model,
data = insight::get_data(model),
criterion = mse,
criterion.name,
k = 10L,
reps = 1L,
seed,
details = k <= 10l,="" confint,="" level="0.95," method="NULL," ncores="1L," type="response" ,="" start="FALSE," f,="" fpara="f," locals="list()," model.function="NULL," model.function.name="NULL," ...="" )<="" p="">cvMixed(
model,
package,
data = insight::get_data(model),
criterion = mse,
criterion.name,
k,
reps = 1L,
confint,
level = 0.95,
seed,
details,
ncores = 1L,
clusterVariables,
predict.clusters.args = list(object = model, newdata = data),
predict.cases.args = list(object = model, newdata = data),
fixed.effects,
...
)
cvSelect(
procedure,
data,
criterion = mse,
criterion.name,
model,
y.expression,
k = 10L,
confint = n >= 400,
level = 0.95,
reps = 1L,
save.coef,
details = k <= 10l,="" save.model="FALSE," seed,="" ncores="1L," ...="" )<="" p="">
folds(n, k)
fold(folds, i_, ...)
# S3 method for folds
fold(folds, i_, ...)
# S3 method for folds
print(x, ...)
GetResponse(model, ...)
# S3 method for default
GetResponse(model, ...)
# S3 method for merMod
GetResponse(model, ...)
# S3 method for lme
GetResponse(model, ...)
# S3 method for glmmTMB
GetResponse(model, ...)
# S3 method for modList
GetResponse(model, ...)
checkFormula(model, data.names)
=>=>The utility functions return various kinds of objects:
cvCompute() returns an object of class "cv", with the CV criterion
("CV crit"), the bias-adjusted CV criterion ("adj CV crit"),
the criterion for the model applied to the full data ("full crit"),
the confidence interval and level for the bias-adjusted CV criterion ("confint"),
the number of folds ("k"), and the seed for R's random-number
generator ("seed"). If details=TRUE, then the returned object
will also include a "details" component, which is a list of two
elements: "criterion", containing the CV criterion computed for the
cases in each fold; and "coefficients", regression coefficients computed
for the model with each fold deleted. Some cv() methods calling cvCompute()
may return a subset of these components and may add additional information.
If reps > 1, then an object of class "cvList" is returned,
which is literally a list of "cv" objects.
cvMixed() also returns an object of class "cv" or
"cvList".
cvSelect returns an object of class
"cvSelect" inheriting from "cv", or an object of
class "cvSelectList" inheriting from "cvList".
folds() returns an object of class folds, for which
there are fold() and print() methods.
GetResponse() returns the (numeric) response variable
from the model.
The supplied default method returns the model$y component
of the model object, or, if model is an S4 object, the result
returned by the get_response() function in
the insight package. If this result is NULL, the result of
model.response(model.frame(model)) is returned, checking in any case whether
the result is a numeric vector.
There are also "lme", "merMod"
and "glmmTMB" methods that convert factor
responses to numeric 0/1 responses, as would be appropriate
for a generalized linear mixed model with a binary response.
checkFormula() returns TRUE if all variables in the
model formula are also in the data to which the model is fit; FALSE is this
is not the case (and q warning is printed); or NA if the function
couldn't extract a model formula.
a regression model object.
data frame to which the model was fit (not usually necessary,
except for cvSelect()).
cross-validation criterion ("cost" or lack-of-fit) function of form f(y, yhat)
where y is the observed values of the response and
yhat the predicted values; the default is mse
(the mean-squared error).
a character string giving the name of the CV criterion function
in the returned "cv" object).
perform k-fold cross-validation (default is 10); k
may be a number or "loo" or "n" for n-fold (leave-one-out)
cross-validation; for folds(), k must be a number.
number of times to replicate k-fold CV (default is 1).
for R's random number generator; optional, if not supplied a random seed will be selected and saved; not needed for n-fold cross-validation.
if TRUE (the default if the number of
folds k <= 10), save detailed information about the value of the
CV criterion for the cases in each fold and the regression coefficients
with that fold deleted.
if TRUE (the default if the number of cases is 400
or greater), compute a confidence interval for the bias-corrected CV
criterion, if the criterion is the average of casewise components.
confidence level (default 0.95).
computational method to apply; use by some cv()
methods.
number of cores to use for parallel computations
(default is 1, i.e., computations aren't done in parallel).
used by some cv() methods, such as the default method,
where type is passed to the type argument of predict();
the default is type="response", which is appropriate, e.g., for a "glm" model
and may be recognized or ignored by predict() methods for other model classes.
used by some cv() methods;
if TRUE (the default is FALSE), the start argument,
set to the vector of regression coefficients for the model fit to the full data, is passed
to update(), possibly making the CV updates faster, e.g. for a GLM.
function to be called by cvCompute() for each fold.
function to be called by cvCompute() for each fold
using parallel computation.
a named list of objects that are required in the local environment
of cvCompute() for f() or fPara().
a regression function, typically for a new cv() method,
residing in a package that's not a declared dependency of the cv package,
e.g., nnet::multinom.
the quoted name of the regression function, e.g.,
"multinom".
to match generic; passed to predict() for the default method,
and to fPara() (for parallel computations) in cvCompute().
the name of the package in which mixed-modeling function (or functions) employed resides; used to get the namespace of the package.
a character vector of names of the variables defining clusters for a mixed model with nested or crossed random effects; if missing, cross-validation is performed for individual cases rather than for clusters
a list of arguments to be used to predict
the whole data set from a mixed model when performing CV on clusters;
the first two elements should be
model and newdata; see the "Extending the cv package" vignette
(vignette("cv-extend", package="cv")).
a list of arguments to be used to predict
the whole data set from a mixed model when performing CV on cases;
the first two elements should be
model and newdata; see the "Extending the cv package" vignette
(vignette("cv-extend", package="cv")).
a function to be used to compute fixed-effect
coefficients for cluster-based CV when details = TRUE.
a model-selection procedure function (see Details).
normally the response variable is found from the
model argument; but if, for a particular selection procedure, the
model argument is absent, or if the response can't be inferred from the
model, the response can be specified by an expression, such as expression(log(income)),
to be evaluated within the data set provided by the data argument.
save the coefficients from the selected models? Deprecated
in favor of the details argument; if specified, details is set
is set to the value of save.coef.
save the model that's selected using the full data set.
number of cases, for constructed folds.
an object of class "folds".
a fold number for an object of class "folds".
a "cv", "cvList", or "folds" object to be printed
names of variables in the data set to which the model was fit; if missing, an attempt will be made to extract the data from the model.
cvCompute(): used internally by cv() methods (not for direct use);
exported to support new cv() methods.
cvMixed(): used internally by cv() methods
for mixed-effect models (not for direct use);
exported to support new cv() methods.
cvSelect(): used internally by cv() methods for
cross-validating a model-selection procedure; may also be called
directly for this purpose, but use via cv() is preferred.
cvSelect() is exported primarily to support new model-selection procedures.
folds(): used internally by cv() methods (not for direct use).
fold(): to extract a fold from a "folds" object.
fold(folds): fold() method for "folds" objects.
print(folds): print() method for "folds" objects.
GetResponse(): function to return the response variable
from a regression model.
GetResponse(default): default method.
GetResponse(merMod): "merMod" method.
GetResponse(lme): "lme" method.
GetResponse(glmmTMB): "glmmTMB" method.
GetResponse(modList): "modList" method.
checkFormula(): check a model formula to determine whether it include
variables not in the data to which the model was fit; prints a warning if this
is not the case.
cv, cv.merMod,
cv.function.
fit <- lm(mpg ~ gear, mtcars)
GetResponse(fit)
set.seed(123)
(ffs <- folds(n=22, k=5))
fold(ffs, 2)
Run the code above in your browser using DataLab