## S3 method for class 'formula':
earth(formula, data, \dots)## S3 method for class 'default':
earth(x = stop("no 'x' arg"), y = stop("no 'y' arg"),
weights = NULL, subset = NULL, na.action = na.fail,
penalty = if(degree > 1) 3 else 2, trace = 0, keepxy = FALSE,
nk = max(21, 2 * NCOL(x) + 1), degree = 1,
linpreds = FALSE, allowed = NULL,
thresh = 0.001, minspan = 1, newvar.penalty = 0,
fast.k = 20, fast.beta = 1,
pmethod = "backward", ppenalty = penalty, nprune = NULL,
Object = NULL, Get.crit = get.gcv,
Eval.model.subsets = eval.model.subsets,
Print.pruning.pass = print.pruning.pass,
Force.xtx.prune = FALSE, Use.beta.cache = TRUE, ...)
formula.y values are very big or very small you may get
better results if you x to use.
Default is NULL, meaning all.na.fail, and only na.fail is supported.if(degree>1) 3 else 2.
A value of 0 penalises only terms, not knots.
The value -1 is a special case, meaning no penalty, so GCV=RSS/n.
Theory suggests values in t0 none 1 overview 2 forward pass 3 pruning 4 more pruning 5 ...
x, y, and subset in the returned value.
Default is FALSE
The following arguments are for the forward passmax(21,2*NCOL(x)+1).
The number of terms created by the forward pass will be
less than nk if there are linearly dependent terms
lm.
The default is FALSE, meaning all predictors enter
in the standard MARS fashion i.e. in hinge functions.
A predictor's index inallowed function just before adding a term.
If allowed returns TRUE the term goes minspan internally as per
Friedman's MARS paper section 3.8 "backward".
One of: backward none exhaustive forward seqrep.
If y has multiple columns, then only backward or none
is allowed.
Pruning canpenalty but for the pruning pass.
Default is penalty.update.earth.y has a single column then earth calls the leaps routines;
if <nk * nk * ncol(x) * sizeof(double) bytes.
Set Use.beta.cache=FALSE to save memory.earth.formula: arguments passed to earth.default. earth.default: unused, but provided for generic/method consistency.
format.earth is a term).
Term number 1 is always the intercept.y has multiple columns).1-rss/rss.null.
R-Squared of the model
(calculated over all responses if y has multiple columns).
A measure of how well the model fits the training data.y has multiple columns).
The GCV is calculated using ppenalty (as are all returned GCVs).
For details of the GCV calculation, see
equation 30 in Friedman's MARS paper and earth:::get.gcv.1-gcv/gcv.null.
An estimate of the predictive power of the model
(calculated over all responses if y has multiple columns).
Unlike rsq, grsq can be negative.
A negative grsq would indicate
a severely over parameterised model --- a model that
would not generalise well
even though it may be a good fit to the training data.
Example of a negative grsq:
earth(mpg~., data=mtcars, pmethod="none", trace=4)x.
Each column corresponds to a selected term.
Each row corresponds to a row in in the input matrix x,
after taking subset.
See model.matrix.earth for an example of bx handling.
For brevity, "h" is used instead of "pmax" in column names.
Example bx:(Intercept) h(Girth-12.9) h(12.9-Girth) h(Girth-12.9)*h(...
[1,] 1 0.0 4.6 0
[2,] 1 0.0 4.3 0
[3,] 1 0.0 4.1 0
...0 if predictor j is not in term i
-1 if a factor of the form pmax(c - xj) is in term i
1 if a factor of the form pmax(xj - c) is in term i
2 if predictor j enters term i linearly.
This matrix includes all terms generated by the forward.pass,
including those not in selected.terms.
Note that the terms may not be in pairs, because the forward pass
deletes linearly dependent terms before handing control to the pruning pass.
Example dirs:Girth Height
(Intercept) 0 0 #no factors in intercept
h(Girth-12.9) 1 0 #2nd term uses Girth
h(12.9-Girth) -1 0 #3rd term uses Girth
h(Girth-12.9)*h(Height-76) 1 1 #4th term uses Girth and Height
...selected.terms.
Note that the terms may not be in pairs, because the forward pass
deletes linearly dependent terms before handing control to the pruning pass.
Example cuts:Girth Height
(Intercept) 0.0 0 #intercept, no cuts
h(Girth-12.9) 12.9 0 #2nd term has cut at 12.9
h(12.9-Girth) 12.9 0 #3rd term has cut at 12.9
h(Girth-12.9)*h(Height-76) 12.9 76 #4th term has two cuts
...cuts and dirs.
The first element selected.terms[1] is always 1, the intercept.prune.terms is the model size
(the model size is the number of terms in the model).
Each row is a vector of term numbers for the best model of that size.
An element is 0 if the term is not in the model, thus prune.terms is a
lower triangular matrix, with dimensions nprune x nprune.
The model selected by the pruning pass
is at row length(selected.terms).
Example prune.terms:[1,] 1 0 0 0 0 0 0 #intercept-only model
[2,] 1 2 0 0 0 0 0 #best 2 term model uses terms 1,2
[3,] 1 2 4 0 0 0 0 #best 3 term model uses terms 1,2,4
[4,] 1 2 9 8 0 0 0 #and so on
...ncol(y).
The rss component above is equal to sum(rss.per.response).ncol(y).ncol(y).
The gcv component above is equal to sum(gcv.per.response).ncol(y).nprune.
If y has multiple columns, the RSS is summed over all responses for each subset.
The null RSS (i.e. the RSS of an intercept only-model) is rss.per.subset[1].
The rss above is
rss.per.subset[length(selected.terms)].prune.terms.
Length is is nprune.
If y has multiple columns, the GCV is summed over all responses for each subset.
The null GCV (i.e. the GCV of an intercept-only model) is gcv.per.subset[1].
The gcv above is gcv.per.subset[length(selected.terms)].nrow(y) x ncol(y).nrow(y) x ncol(y).length(selected.terms) x ncol(y).
Each column holds the least squares coefficients from regressing that
column of y on bx.
The first row holds the intercept coefficients.earth's ppenalty argument.earth.earth.formula.x, y, and subset.
These components exist only if keepxy=TRUE.ozone data to compare mda::mars with other techniques.
(If you use Faraway's examples with earth instead of mars, use $bx
instead of $x).
Earth's pruning pass uses the leaps package which is based on
techniques in Miller. Faraway (2005) Extending the Linear Model with R
Friedman (1991) Multivariate Adaptive Regression Splines (with discussion) Annals of Statistics 19/1, 1--141
Friedman (1993) Fast MARS
Stanford University Department of Statistics, Technical Report 110
Hastie, Tibshirani, and Friedman (2001) The Elements of Statistical Learning
Miller, Alan (1990, 2nd ed. 2002) Subset Selection in Regression
format.earth,
get.nterms.per.degree,
get.nused.preds.per.subset,
mars.to.earth,
model.matrix.earth,
ozone1,
plot.earth.models,
plot.earth,
plotmo,
predict.earth,
reorder.earth,
summary.earth,
update.eartha <- earth(Volume ~ ., data = trees)
summary(a, digits = 2)
# yields:
# Call:
# earth(formula = Volume ~ ., data = trees)
#
# Expression:
# 27
# + 6 * pmax(0, Girth - 14)
# - 3.2 * pmax(0, 14 - Girth)
# + 0.61 * pmax(0, Height - 75)
#
# Number of cases: 31
# Selected 4 of 5 terms, and 2 of 2 predictors
# Number of terms at each degree of interaction: 1 3 (additive model)
# GCV: 11 RSS: 196 GRSq: 0.96 RSq: 0.98Run the code above in your browser using DataLab