Multivariate Adaptive Regression Splines
Build a regression model using the techniques in Friedman's papers "Multivariate Adaptive Regression Splines" and "Fast MARS". See the package vignette Notes on the earth package.
"earth"(formula = stop("no 'formula' argument"), data = NULL, weights = NULL, wp = NULL, subset = NULL, na.action = na.fail, pmethod = c("backward", "none", "exhaustive", "forward", "seqrep", "cv"), keepxy = FALSE, trace = 0, glm = NULL, degree = 1, nprune = NULL, ncross=1, nfold=0, stratify=TRUE, varmod.method = "none", varmod.exponent = 1, varmod.conv = 1, varmod.clamp = .1, varmod.minspan = -3, Scale.y = (NCOL(y)==1), ...)"earth"(x = stop("no 'x' argument"), y = stop("no 'y' argument"), weights = NULL, wp = NULL, subset = NULL, na.action = na.fail, pmethod = c("backward", "none", "exhaustive", "forward", "seqrep", "cv"), keepxy = FALSE, trace = 0, glm = NULL, degree = 1, nprune = NULL, ncross=1, nfold=0, stratify=TRUE, varmod.method = "none", varmod.exponent = 1, varmod.conv = 1, varmod.clamp = .1, varmod.minspan = -3, Scale.y = (NCOL(y)==1), ...)"earth"(x = stop("no 'x' argument"), y = stop("no 'y' argument"), weights = NULL, wp = NULL, subset = NULL, na.action = na.fail, pmethod = c("backward", "none", "exhaustive", "forward", "seqrep", "cv"), keepxy = FALSE, trace = 0, glm = NULL, degree = 1, penalty = if(degree > 1) 3 else 2, nk = min(200, max(20, 2 * ncol(x))) + 1, thresh = 0.001, minspan = 0, endspan = 0, newvar.penalty = 0, fast.k = 20, fast.beta = 1, linpreds = FALSE, allowed = NULL, nprune = NULL, Object = NULL, Scale.y = (NCOL(y)==1), Adjust.endspan = 2, Force.weights = FALSE, Use.beta.cache = TRUE, Force.xtx.prune = FALSE, Get.leverages = NROW(x) < 1e5, Exhaustive.tol = 1e-10, ...)
- Model formula.
Data frame for
- Matrix or dataframe containing the independent variables.
- Vector containing the response variable, or, in the case of multiple responses, a matrix or dataframe whose columns are the values for each response.
Index vector specifying which cases to use, i.e., which rows in
xto use. Default is NULL, meaning all.
Default is NULL, meaning no case weights.
weightsmust have length equal to
subset. Zero weights are converted to a very small nonzero value.
Default is NULL, meaning no response weights.
wpmust have an element for each column of
y, if any, have been expanded). Zero weights are converted to a very small nonzero value.
NA action. Default is
na.fail, and only
FALSE. Set to
TRUEto retain the following in the returned value:
weights. The function
update.earthand friends will use these if present instead of searching for them in the environment at the time
update.earthis invoked. When the
nfoldargument is used with
earthkeeps more data and calls
predict.earthmultiple times to generate
cv.arguments in the Value section below). It therefore makes cross-validation significantly slower.
earth's execution. Default is
.3variance model (the
.5cross validation (the
4model mats summary, pruning details
5full model mats, internal details of operation
NULL (default) or a list of arguments to pass on to
glm. See the documentation of
glmfor a description of these arguments See Generalized linear models in the vignette. Example:
earth(survived~., data=etitanic, degree=2, glm=list(family=binomial))The following arguments are for the forward pass.
Maximum degree of interaction (Friedman's $mi$).
1, meaning build an additive model (i.e., no interaction terms).
Generalized Cross Validation (GCV) penalty per knot.
if(degree>1) 3 else 2. Simulation studies suggest values in the range of about
4. The FAQ section in the vignette has some information on GCVs. Special values (for use by knowledgeable users): The value
0penalizes only terms, not knots. The value
-1means no penalty, so GCV = RSS/n.
Maximum number of model terms before pruning, i.e., the
maximum number of terms created by the forward pass.
Includes the intercept.
The actual number of terms created by the forward pass will often be
nkbecause of other stopping conditions. See Termination conditions for the forward pass in the vignette. The default is semi-automatically calculated from the number of predictors but may need adjusting.
Forward stepping threshold.
0.001. This is one of the arguments used to decide when forward stepping should terminate: the forward pass terminates if adding a term changes RSq by less than
thresh. See Termination conditions for the forward pass in the vignette.
Minimum number of observations between knots.
(This increases resistance to runs of correlated noise in the input data.)
minspan=0is treated specially and means calculate the
minspaninternally, as per Friedman's MARS paper section 3.8 with $alpha$ = 0.05. Set
trace>=2to see the calculated value. Use
endspan=1to consider all x values. Negative values of
minspanspecify the maximum number of knots per predictor. These will be equally spaced. For example,
minspan=-3allows three evenly spaced knots for each predictor. As always, knots that fall in the endzones specified by
endspanwill be ignored.
Minimum number of observations before the first and after the final knot.
endspan=0is treated specially and means calculate the
minspaninternally, as per the MARS paper equation 45 with $alpha$ = 0.05. Set
trace>=2to see the calculated value. Be wary of reducing
endspan, especially if you plan to make predictions beyond or near the limits of the training data. Overfitting near the edges of training data is much more likely with a small
endspan. The model's
GRSqwon't indicate when this overfitting is occurring. (A
plotmoplot can help: look for sharp hinges at the edges of the data). See also the
Penalty for adding a new variable in the forward pass
(Friedman's $gamma$, equation 74 in the MARS paper).
0, meaning no penalty for adding a new variable. Useful non-zero values typically range from about
0.2and sometimes higher --- you will need to experiment. A word of explanation. With the default
newvar.penalty=0, if two variables have nearly the same effect (e.g. they are collinear), at any step in the forward pass
earthwill arbitrarily select one or the other (depending on noise in the sample). Both variables can appear in the final model, complicating model interpretation. On the other hand with a non-zero
newvar.penalty, the forward pass will be reluctant to add a new variable --- it will rather try to use a variable already in the model, if that does not affect RSq too much. The resulting final model may be easier to interpret, if you are lucky. There will often be a small performance hit (a worse GCV).
Maximum number of parent terms considered at each step of the forward pass.
(This speeds up the forward pass. See the Fast MARS paper section 3.0.)
20. A value of
0is treated specially (as being equivalent to infinity), meaning no Fast MARS. Typical values, apart from
5. In general, with a lower
earthis faster; with a higher
fast.k, or with
fast.kdisabled (set to
earthbuilds a better model. However, because of random variation this general rule often doesn't apply.
Fast MARS ageing coefficient, as described in the
Fast MARS paper section 3.1.
1. A value of
0sometimes gives better results.
Index vector specifying which predictors should enter linearly, as in
lm. The default is
FALSE, meaning all predictors enter in the standard MARS fashion, i.e., in hinge functions. This does not say that a predictor must enter the model; only that if it enters, it enters linearly. See The
linpredsargument in the vignette. A predictor's index in
linpredsis the column number in the input matrix
x(after factors have been expanded).
linpreds=TRUEmakes all predictors enter linearly (the
linpredsmay also be a character vector e.g.
linpreds=c("wind", "vis"). Note:
grepis used for matching. Thus
"wind"will match all variables that have
"wind"in their names. Use
"^wind$"to match only the variable named
Function specifying which predictors can interact and how.
Default is NULL, meaning all standard MARS terms are allowed.
During the forward pass,
allowedfunction before considering a term for inclusion; the term can go into the model only if the
TRUE. See The allowed argument in the vignette. The following arguments are for the pruning pass.
backward none exhaustive forward seqrep cv. Default is
"backward". New in version 4.4.0: Specify
pmethod="cv"to use cross-validation to select the number of terms. This selects the number of terms that gives the maximum mean out-of-fold RSq on the fold models. Requires the
"none"to retain all the terms created by the forward pass. If
yhas multiple columns, then only
"none"is allowed. Pruning can take a while if
"exhaustive"is chosen and the model is big (more than about 30 terms). The current version of the
leapspackage used during pruning does not allow user interrupts (i.e., you have to kill your R session to interrupt; in Windows use the Task Manager or from the command line use
Maximum number of terms (including intercept) in the pruned model.
Default is NULL, meaning all terms created by the forward pass
(but typically not all terms will remain after pruning).
Use this to enforce an upper bound on the model size (that is less than
nk), or to reduce exhaustive search time with
pmethod="exhaustive". The following arguments are for cross validation.
Only applies if
nfold>1. Number of cross-validations. Each cross-validation has
Number of cross-validation folds.
0, no cross validation. If greater than
earthfirst builds a standard model as usual with all the data. It then builds
nfoldcross-validated models, measuring R-Squared on the out-of-fold (left out) data each time. The final cross validation R-Squared (
CVRSq) is the mean of these out-of-fold R-Squareds. The above process of building
nfoldmodels is repeated
ncrosstimes (by default, once). Use
trace=.5to trace cross-validation. Further statistics are calculated if
keepxy=TRUEor if a binomial or poisson model (specified with the
glmargument). See Cross validation in the vignette.
Only applies if
nfold>1. Default is
TRUE. Stratify the cross-validation samples so that an approximately equal number of cases with a non-zero response occur in each cross validation subset. So if the response
yis logical, the
TRUEs will be spread evenly across folds. And if the response is a multilevel factor, there will be an approximately equal number of each factor level in each fold (because a multilevel factor response gets expanded to columns of zeros and ones, see Factors in the vignette). We say approximately equal because the number of occurrences of a factor level may not be exactly divisible by the number of folds. The following arguments are for variance models (new in version 4.0.0).
Construct a variance model.
For details, see
varmodand the vignette Variance models in earth. Use
trace=.3to trace construction of the variance model. This argument requires
ncross. (We suggest at least
ncross=30here to properly calculate the variance of the errors --- although you can use a smaller value, say
3, for debugging.) The
varmod.methodargument should be one of
"none"Default. Don't build a variance model.
"const"Assume homoscedastic errors.
lmto estimate standard deviation as a function of the predicted response.
gam. This will use either
mgcvpackage, whichever is loaded.
"power"Estimate standard deviation as
intercept + coef * predicted.response^exponent, where
exponentwill be estimated by
nls. This is equivalent to
exponentis automatically estimated instead of being held at the value set by the
"power"but no intercept (offset) term.
"x.gam"Like the similarly named options above, but estimate standard deviation by regressing on the predictors
x(instead of the predicted response). A current implementation restriction is that
"x.gam"allows only models with one predictor (
xmust have only one column).
Power transform applied to the rhs before regressing the
absolute residuals with the specified
varmod.method. Default is
1. For example, with
varmod.method="lm", if you expect the standard deviance to increase linearly with the mean response, use
varmod.exponent=1. If you expect the standard deviance to increase with the square root of the mean response, use
varmod.exponent=.5(where negative response values will be treated as
0, and you will get an error message if more than 20% of them are negative).
Convergence criterion for the Iteratively Reweighted Least Squares used
when creating the variance model.
Iterations stop when the mean value of the coefficients of the
residual model change by less than
varmod.convpercent. Default is
1percent. Negative values force the specified number of iterations, e.g.
varmod.conv=-2means iterate twice. Positive values are ignored for
varmod="const"and also currently ignored for
varmod="earth"(these are iterated just once, the same as using
The estimated standard deviation of the main model errors
is forced to be at least a small positive value,
which we call
min.sd. This prevents negative or absurdly small estimated standard deviations. Clamping takes place in
predict.varmod, which is called by
predict.earthwhen estimating prediction intervals. The value of
min.sdis determined when building the variance model as
min.sd = varmod.clamp * mean(sd(training.residuals)). The default
Only applies when
"x.earth". This is the
minspanused in the internal call to
earthwhen creating the variance model (not the main
earthmodel). Default is
-3, i.e., three evenly spaced knots per predictor. Residuals tend to be very noisy, and allowing only this small number of knots helps prevent overfitting. The following arguments are for internal or advanced use.
Earth object to be updated, for use by
yinternally in the forward pass for better numeric stability. This is invisible to the user, up to numerical differences. Scaling here means subtract the mean and divide by the standard deviation. Default is
NCOL(y)==1, i.e., scale
yhas multiple columns.
New in version 4.2.0.
In interaction terms,
endspangets multiplied by this value. This reduces the possibility of an overfitted interaction term supported by just a few cases on the boundary of the predictor space (as sometimes seen in our simulation studies). The default is
Adjust.endspan=1for compatibility with old versions of
FALSE. For testing the
weightsargument. Force use of the code for handling weights in the
earthcode, even if
weights=NULLor all the weights are the same. This will not necessarily generate an identical model, primarily because the non-weighted code requires some tests for numerical stability that can sometimes affect knot selection.
TRUE. Using the beta cache takes a little more memory but is faster (by 20% and often much more for large models). The beta cache uses
nk * nk * ncol(x) * sizeof(double)bytes. (The beta cache is an innovation in this implementation of MARS and does not appear in Friedman's papers. It is not related to the
fast.betaargument. Certain regression coefficients in the forward pass can be saved and re-used, thus saving recalculation time.)
FALSE. This argument pertains to subset evaluation in the pruning pass. By default, if
yhas a single column then
yhas multiple columns then
leapsroutines are numerically more stable but do not support multiple responses (
leapsis based on the QR decomposition and
EvalSubsetsUsingXtxis based on the inverse of X'X). Setting
Force.xtx.prune=TRUEforces use of
EvalSubsetsUsingXtx, even if
yhas a single column.
New in version 4.4.0.
TRUEunless the model has more than 100 thousand cases. The leverages are the diagonal hat values for the linear regression of
bx. The leverages are needed only for certain model checks, for example when
plotresis called with
versus=4). Details: This argument was introduced to reduce peak memory usage. When
n >> p, memory use peaks when
earthis calculating the leverages.
1e-10. Applies only when
pmethod="exhaustive". If the reciprocal of the condition number of
bxis less than
pmethod="backward". See XHAUST returned error code -999 in the vignette.
Dots are passed on to
An S3 model of class
earth.objectfor a complete description.
The primary references are the Friedman papers.
Readers may find the MARS section in Hastie, Tibshirani,
and Friedman a more accessible introduction.
The Wikipedia article is recommended for an elementary introduction.
Faraway takes a hands-on approach,
ozone data to compare
mda::mars with other techniques.
(If you use Faraway's examples with
earth instead of
$x, and check out the book's errata.)
Friedman and Silverman is recommended background reading for the MARS paper.
Earth's pruning pass uses code from the
which is based on techniques in Miller.
Faraway (2005) Extending the Linear Model with R
Friedman (1991) Multivariate Adaptive Regression Splines (with discussion)
Annals of Statistics 19/1, 1--141
Friedman (1993) Fast MARS
Stanford University Department of Statistics, Technical Report 110
Friedman and Silverman (1989)
Flexible Parsimonious Smoothing and Additive Modeling
Technometrics, Vol. 31, No. 1.
Hastie, Tibshirani, and Friedman (2009) The Elements of Statistical Learning (2nd ed.)
Leathwick, J.R., Rowe, D., Richardson, J., Elith, J., & Hastie, T. (2005)
Using multivariate adaptive regression splines to predict the distributions
of New Zealand's freshwater diadromous fish Freshwater Biology, 50, 2034-2052
Miller, Alan (1990, 2nd ed. 2002) Subset Selection in Regression
Wikipedia article on MARS
Please see the main package vignette
Notes on the earth package.
The vignette can also be downloaded from
Variance models in earth
is also included with the package.
It describes how to build variance models and
generate prediction intervals for
earth.mod <- earth(Volume ~ ., data = trees) plotmo(earth.mod) summary(earth.mod, digits = 2, style = "pmax")