earth
Multivariate Adaptive Regression Splines
Build a regression model using the techniques in Friedman's papers "Multivariate Adaptive Regression Splines" and "Fast MARS".
See the package vignette “Notes on the earth package”.
 Keywords
 models, regression, smooth
Usage
"earth"(formula = stop("no 'formula' argument"), data = NULL, weights = NULL, wp = NULL, subset = NULL, na.action = na.fail, pmethod = c("backward", "none", "exhaustive", "forward", "seqrep", "cv"), keepxy = FALSE, trace = 0, glm = NULL, degree = 1, nprune = NULL, ncross=1, nfold=0, stratify=TRUE, varmod.method = "none", varmod.exponent = 1, varmod.conv = 1, varmod.clamp = .1, varmod.minspan = 3, Scale.y = (NCOL(y)==1), ...)
"earth"(x = stop("no 'x' argument"), y = stop("no 'y' argument"), weights = NULL, wp = NULL, subset = NULL, na.action = na.fail, pmethod = c("backward", "none", "exhaustive", "forward", "seqrep", "cv"), keepxy = FALSE, trace = 0, glm = NULL, degree = 1, nprune = NULL, ncross=1, nfold=0, stratify=TRUE, varmod.method = "none", varmod.exponent = 1, varmod.conv = 1, varmod.clamp = .1, varmod.minspan = 3, Scale.y = (NCOL(y)==1), ...)
"earth"(x = stop("no 'x' argument"), y = stop("no 'y' argument"), weights = NULL, wp = NULL, subset = NULL, na.action = na.fail, pmethod = c("backward", "none", "exhaustive", "forward", "seqrep", "cv"), keepxy = FALSE, trace = 0, glm = NULL, degree = 1, penalty = if(degree > 1) 3 else 2, nk = min(200, max(20, 2 * ncol(x))) + 1, thresh = 0.001, minspan = 0, endspan = 0, newvar.penalty = 0, fast.k = 20, fast.beta = 1, linpreds = FALSE, allowed = NULL, nprune = NULL, Object = NULL, Scale.y = (NCOL(y)==1), Adjust.endspan = 2, Force.weights = FALSE, Use.beta.cache = TRUE, Force.xtx.prune = FALSE, Get.leverages = NROW(x) < 1e5, Exhaustive.tol = 1e10, ...)
Arguments
 formula
 Model formula.
 data

Data frame for
formula
.  x
 Matrix or dataframe containing the independent variables.
 y
 Vector containing the response variable, or, in the case of multiple responses, a matrix or dataframe whose columns are the values for each response.
 subset

Index vector specifying which cases to use, i.e., which rows in
x
to use. Default is NULL, meaning all.  weights

Case weights.
Default is NULL, meaning no case weights.
If specified,
weights
must have length equal tonrow(x)
before applyingsubset
. Zero weights are converted to a very small nonzero value.  wp

Response weights.
Default is NULL, meaning no response weights.
If specified,
wp
must have an element for each column ofy
(afterfactors
iny
, if any, have been expanded). Zero weights are converted to a very small nonzero value.  na.action

NA action. Default is
na.fail
, and onlyna.fail
is supported.  keepxy

Default is
FALSE
. Set toTRUE
to retain the following in the returned value:x
andy
(ordata
),subset
, andweights
. The functionupdate.earth
and friends will use these if present instead of searching for them in the environment at the timeupdate.earth
is invoked. When thenfold
argument is used withkeepxy=TRUE
,earth
keeps more data and callspredict.earth
multiple times to generatecv.oof.rsq.tab
andcv.infold.rsq.tab
(see thecv.
arguments in the “Value” section below). It therefore makes crossvalidation significantly slower.  trace

Trace
earth
's execution. Default is0
. Values:0
no tracing.3
variance model (thevarmod.method
arg).5
cross validation (thenfold
arg)1
overview2
forward pass3
pruning4
model mats summary, pruning details5
full model mats, internal details of operation  glm

NULL (default) or a list of arguments to pass on to
glm
. See the documentation ofglm
for a description of these arguments See “Generalized linear models” in the vignette. Example:earth(survived~., data=etitanic, degree=2, glm=list(family=binomial))
The following arguments are for the forward pass.
 degree

Maximum degree of interaction (Friedman's $mi$).
Default is
1
, meaning build an additive model (i.e., no interaction terms).  penalty

Generalized Cross Validation (GCV) penalty per knot.
Default is
if(degree>1) 3 else 2
. Simulation studies suggest values in the range of about2
to4
. The FAQ section in the vignette has some information on GCVs. Special values (for use by knowledgeable users): The value0
penalizes only terms, not knots. The value1
means no penalty, so GCV = RSS/n.  nk

Maximum number of model terms before pruning, i.e., the
maximum number of terms created by the forward pass.
Includes the intercept.
The actual number of terms created by the forward pass will often be
less than
nk
because of other stopping conditions. See “Termination conditions for the forward pass” in the vignette. The default is semiautomatically calculated from the number of predictors but may need adjusting.  thresh

Forward stepping threshold.
Default is
0.001
. This is one of the arguments used to decide when forward stepping should terminate: the forward pass terminates if adding a term changes RSq by less thanthresh
. See “Termination conditions for the forward pass” in the vignette.  minspan

Minimum number of observations between knots.
(This increases resistance to runs of correlated noise in the input data.)
The default
minspan=0
is treated specially and means calculate theminspan
internally, as per Friedman's MARS paper section 3.8 with $alpha$ = 0.05. Settrace>=2
to see the calculated value. Useminspan=1
andendspan=1
to consider all x values. Negative values ofminspan
specify the maximum number of knots per predictor. These will be equally spaced. For example,minspan=3
allows three evenly spaced knots for each predictor. As always, knots that fall in the endzones specified byendspan
will be ignored.  endspan

Minimum number of observations before the first and after the final knot.
The default
endspan=0
is treated specially and means calculate theminspan
internally, as per the MARS paper equation 45 with $alpha$ = 0.05. Settrace>=2
to see the calculated value.Be wary of reducing
endspan
, especially if you plan to make predictions beyond or near the limits of the training data. Overfitting near the edges of training data is much more likely with a smallendspan
. The model'sRSq
andGRSq
won't indicate when this overfitting is occurring. (Aplotmo
plot can help: look for sharp hinges at the edges of the data). See also theAdjust.endspan
argumen.  newvar.penalty

Penalty for adding a new variable in the forward pass
(Friedman's $gamma$, equation 74 in the MARS paper).
Default is
0
, meaning no penalty for adding a new variable. Useful nonzero values typically range from about0.01
to0.2
and sometimes higher  you will need to experiment. A word of explanation. With the defaultnewvar.penalty=0
, if two variables have nearly the same effect (e.g. they are collinear), at any step in the forward passearth
will arbitrarily select one or the other (depending on noise in the sample). Both variables can appear in the final model, complicating model interpretation. On the other hand with a nonzeronewvar.penalty
, the forward pass will be reluctant to add a new variable  it will rather try to use a variable already in the model, if that does not affect RSq too much. The resulting final model may be easier to interpret, if you are lucky. There will often be a small performance hit (a worse GCV).  fast.k

Maximum number of parent terms considered at each step of the forward pass.
(This speeds up the forward pass. See the Fast MARS paper section 3.0.)
Default is
20
. A value of0
is treated specially (as being equivalent to infinity), meaning no Fast MARS. Typical values, apart from0
, are20
,10
, or5
. In general, with a lowerfast.k
(say5
),earth
is faster; with a higherfast.k
, or withfast.k
disabled (set to0
),earth
builds a better model. However, because of random variation this general rule often doesn't apply.  fast.beta

Fast MARS ageing coefficient, as described in the
Fast MARS paper section 3.1.
Default is
1
. A value of0
sometimes gives better results.  linpreds

Index vector specifying which predictors should enter linearly, as in
lm
. The default isFALSE
, meaning all predictors enter in the standard MARS fashion, i.e., in hinge functions.This does not say that a predictor must enter the model; only that if it enters, it enters linearly. See “The
linpreds
argument” in the vignette.A predictor's index in
linpreds
is the column number in the input matrixx
(after factors have been expanded).linpreds=TRUE
makes all predictors enter linearly (theTRUE
gets recycled).linpreds
may also be a character vector e.g.linpreds=c("wind", "vis")
. Note:grep
is used for matching. Thus"wind"
will match all variables that have"wind"
in their names. Use"^wind$"
to match only the variable named"wind"
.  allowed

Function specifying which predictors can interact and how.
Default is NULL, meaning all standard MARS terms are allowed.
During the forward pass,
earth
calls theallowed
function before considering a term for inclusion; the term can go into the model only if theallowed
function returnsTRUE
. See “The allowed argument” in the vignette.The following arguments are for the pruning pass.
 pmethod

Pruning method.
One of:
backward none exhaustive forward seqrep cv
. Default is"backward"
. New in version 4.4.0: Specifypmethod="cv"
to use crossvalidation to select the number of terms. This selects the number of terms that gives the maximum mean outoffold RSq on the fold models. Requires thenfold
argument. Use"none"
to retain all the terms created by the forward pass. Ify
has multiple columns, then only"backward"
or"none"
is allowed. Pruning can take a while if"exhaustive"
is chosen and the model is big (more than about 30 terms). The current version of theleaps
package used during pruning does not allow user interrupts (i.e., you have to kill your R session to interrupt; in Windows use the Task Manager or from the command line usetaskkill
).  nprune

Maximum number of terms (including intercept) in the pruned model.
Default is NULL, meaning all terms created by the forward pass
(but typically not all terms will remain after pruning).
Use this to enforce an upper bound on the model size (that is less than
nk
), or to reduce exhaustive search time withpmethod="exhaustive"
.The following arguments are for cross validation.
 ncross

Only applies if
nfold>1
. Number of crossvalidations. Each crossvalidation hasnfold
folds. Default1
.  nfold

Number of crossvalidation folds.
Default is
0
, no cross validation. If greater than1
,earth
first builds a standard model as usual with all the data. It then buildsnfold
crossvalidated models, measuring RSquared on the outoffold (left out) data each time. The final cross validation RSquared (CVRSq
) is the mean of these outoffold RSquareds. The above process of buildingnfold
models is repeatedncross
times (by default, once). Usetrace=.5
to trace crossvalidation. Further statistics are calculated ifkeepxy=TRUE
or if a binomial or poisson model (specified with theglm
argument). See “Cross validation” in the vignette.  stratify

Only applies if
nfold>1
. Default isTRUE
. Stratify the crossvalidation samples so that an approximately equal number of cases with a nonzero response occur in each cross validation subset. So if the responsey
is logical, theTRUE
s will be spread evenly across folds. And if the response is a multilevel factor, there will be an approximately equal number of each factor level in each fold (because a multilevel factor response gets expanded to columns of zeros and ones, see “Factors” in the vignette). We say “approximately equal” because the number of occurrences of a factor level may not be exactly divisible by the number of folds.The following arguments are for variance models (new in version 4.0.0).
 varmod.method

Construct a variance model.
For details, see
varmod
and the vignette “Variance models in earth”. Usetrace=.3
to trace construction of the variance model.This argument requires
nfold
andncross
. (We suggest at leastncross=30
here to properly calculate the variance of the errors  although you can use a smaller value, say3
, for debugging.) Thevarmod.method
argument should be one of"none"
Default. Don't build a variance model."const"
Assume homoscedastic errors."lm"
Uselm
to estimate standard deviation as a function of the predicted response."rlm"
Userlm
."earth"
Useearth
."gam"
Usegam
. This will use eithergam
or themgcv
package, whichever is loaded."power"
Estimate standard deviation asintercept + coef * predicted.response^exponent
, whereintercept
,coef
, andexponent
will be estimated bynls
. This is equivalent tovarmod.method="lm"
except thatexponent
is automatically estimated instead of being held at the value set by thevarmod.exponent
argument."power0"
Same as"power"
but no intercept (offset) term."x.lm"
,"x.rlm"
,"x.earth"
,"x.gam"
Like the similarly named options above, but estimate standard deviation by regressing on the predictorsx
(instead of the predicted response). A current implementation restriction is that"x.gam"
allows only models with one predictor (x
must have only one column).  varmod.exponent

Power transform applied to the rhs before regressing the
absolute residuals with the specified
varmod.method
. Default is1
. For example, withvarmod.method="lm"
, if you expect the standard deviance to increase linearly with the mean response, usevarmod.exponent=1
. If you expect the standard deviance to increase with the square root of the mean response, usevarmod.exponent=.5
(where negative response values will be treated as0
, and you will get an error message if more than 20% of them are negative).  varmod.conv

Convergence criterion for the Iteratively Reweighted Least Squares used
when creating the variance model.
Iterations stop when the mean value of the coefficients of the
residual model change by less than
varmod.conv
percent. Default is1
percent. Negative values force the specified number of iterations, e.g.varmod.conv=2
means iterate twice. Positive values are ignored forvarmod="const"
and also currently ignored forvarmod="earth"
(these are iterated just once, the same as usingvarmod.conv=1
).  varmod.clamp

The estimated standard deviation of the main model errors
is forced to be at least a small positive value,
which we call
min.sd
. This prevents negative or absurdly small estimated standard deviations. Clamping takes place inpredict.varmod
, which is called bypredict.earth
when estimating prediction intervals. The value ofmin.sd
is determined when building the variance model asmin.sd = varmod.clamp * mean(sd(training.residuals))
. The defaultvarmod.clamp
is0.1
.  varmod.minspan

Only applies when
varmod.method="earth"
or"x.earth"
. This is theminspan
used in the internal call toearth
when creating the variance model (not the mainearth
model). Default is3
, i.e., three evenly spaced knots per predictor. Residuals tend to be very noisy, and allowing only this small number of knots helps prevent overfitting.The following arguments are for internal or advanced use.
 Object

Earth object to be updated, for use by
update.earth
.  Scale.y

Scale
y
internally in the forward pass for better numeric stability. This is invisible to the user, up to numerical differences. Scaling here means subtract the mean and divide by the standard deviation. Default isNCOL(y)==1
, i.e., scaley
unlessy
has multiple columns.  Adjust.endspan

New in version 4.2.0.
In interaction terms,
endspan
gets multiplied by this value. This reduces the possibility of an overfitted interaction term supported by just a few cases on the boundary of the predictor space (as sometimes seen in our simulation studies). The default is2
. UseAdjust.endspan=1
for compatibility with old versions ofearth
.  Force.weights

Default is
FALSE
. For testing theweights
argument. Force use of the code for handling weights in theearth
code, even ifweights=NULL
or all the weights are the same. This will not necessarily generate an identical model, primarily because the nonweighted code requires some tests for numerical stability that can sometimes affect knot selection.  Use.beta.cache

Default is
TRUE
. Using the “beta cache” takes a little more memory but is faster (by 20% and often much more for large models). The beta cache usesnk * nk * ncol(x) * sizeof(double)
bytes. (The beta cache is an innovation in this implementation of MARS and does not appear in Friedman's papers. It is not related to thefast.beta
argument. Certain regression coefficients in the forward pass can be saved and reused, thus saving recalculation time.)  Force.xtx.prune

Default is
FALSE
. This argument pertains to subset evaluation in the pruning pass. By default, ify
has a single column thenearth
calls theleaps
routines; ify
has multiple columns thenearth
callsEvalSubsetsUsingXtx
. Theleaps
routines are numerically more stable but do not support multiple responses (leaps
is based on the QR decomposition andEvalSubsetsUsingXtx
is based on the inverse of X'X). SettingForce.xtx.prune=TRUE
forces use ofEvalSubsetsUsingXtx
, even ify
has a single column.  Get.leverages

New in version 4.4.0.
Default is
TRUE
unless the model has more than 100 thousand cases. The leverages are the diagonal hat values for the linear regression ofy
onbx
. The leverages are needed only for certain model checks, for example whenplotres
is called withversus=4
).Details: This argument was introduced to reduce peak memory usage. When
n >> p
, memory use peaks whenearth
is calculating the leverages.  Exhaustive.tol

Default
1e10
. Applies only whenpmethod="exhaustive"
. If the reciprocal of the condition number ofbx
is less thanExhaustive.tol
,earth
forcespmethod="backward"
. See “XHAUST returned error code 999” in the vignette.  ...

Dots are passed on to
earth.fit
.
Value

An S3 model of class
"earth"
.
See earth.object
for a complete description.
References
The primary references are the Friedman papers.
Readers may find the MARS section in Hastie, Tibshirani,
and Friedman a more accessible introduction.
The Wikipedia article is recommended for an elementary introduction.
Faraway takes a handson approach,
using the ozone
data to compare mda::mars
with other techniques.
(If you use Faraway's examples with earth
instead of mars
, use $bx
instead of $x
, and check out the book's errata.)
Friedman and Silverman is recommended background reading for the MARS paper.
Earth's pruning pass uses code from the leaps
package
which is based on techniques in Miller.
Faraway (2005) Extending the Linear Model with R http://www.maths.bath.ac.uk/~jjf23
Friedman (1991) Multivariate Adaptive Regression Splines (with discussion)
Annals of Statistics 19/1, 1141
https://statistics.stanford.edu/research/multivariateadaptiveregressionsplines
Friedman (1993) Fast MARS
Stanford University Department of Statistics, Technical Report 110
https://statistics.stanford.edu/research/fastmars
Friedman and Silverman (1989) Flexible Parsimonious Smoothing and Additive Modeling Technometrics, Vol. 31, No. 1. http://links.jstor.org/sici?sici=00401706%28198902%2931%3A1%3C3%3AFPSAAM%3E2.0.CO%3B2Z
Hastie, Tibshirani, and Friedman (2009) The Elements of Statistical Learning (2nd ed.) http://web.stanford.edu/~hastie/pub.htm
Leathwick, J.R., Rowe, D., Richardson, J., Elith, J., & Hastie, T. (2005) Using multivariate adaptive regression splines to predict the distributions of New Zealand's freshwater diadromous fish Freshwater Biology, 50, 20342052 http://web.stanford.edu/~hastie/pub.htm, http://www.botany.unimelb.edu.au/envisci/about/staff/elith.html
Miller, Alan (1990, 2nd ed. 2002) Subset Selection in Regression http://wp.csiro.au/alanmiller/index.html
Wikipedia article on MARS http://en.wikipedia.org/wiki/Multivariate_adaptive_regression_splines
See Also
Start with summary.earth
, plot.earth
,
evimp
, and plotmo
.
Please see the main package vignette “Notes on the earth package”. The vignette can also be downloaded from http://www.milbo.org/doc/earthnotes.pdf.
The vignette
“Variance models in earth”
is also included with the package.
It describes how to build variance models and
generate prediction intervals for earth
models.
Examples
earth.mod < earth(Volume ~ ., data = trees)
plotmo(earth.mod)
summary(earth.mod, digits = 2, style = "pmax")