## S3 method for class 'formula':
earth(formula, data, \dots)## S3 method for class 'default':
earth(x = stop("no 'x' arg"), y = stop("no 'y' arg"),
subset = NULL, weights = NULL, na.action = na.fail,
penalty = if(degree > 1) 3 else 2, trace = 0,
degree = 1, nk = max(21, 2 * NCOL(x) + 1),
thresh = 0.001, minspan = 0, newvar.penalty = 0,
fast.k = 20, fast.beta = 1, fast.h = NULL,
pmethod = "backward", ppenalty = penalty, nprune = NULL,
Object = NULL, Get.crit = get.gcv,
Eval.model.subsets = eval.model.subsets,
Print.pruning.pass = print.pruning.pass, ...)
y
values are very big or very small, you may get
better results if you scale
y
first.na.fail
, and only na.fail
is supported.if(degree>1) 3 else 2
.
A value of 0 penalises only terms, not knots.
The value -1 is a special case, meaning no penalty, so GCV=RSS/n.
Theory suggests values in the range of about 2 to 3.
0 none 1 overview 2 forward 3 pruning 4 more pruning 5 ... The following arguments are for the forward pass
max(21,2*NCOL(x)+1)
.
The number of terms created by the forward pass will be
less than nk
if there are linearly dependent terms
trace>=2
to see the calculated value. Values:
<0< code=""> add to the internally calculated min span (i.e. decrease span).
0
(default) use internally calculated min span as per
Fri0<>
backward none exhaustive forward seqrep
.
Default is "backward"
.
Model subset evaluation for pruning uses the leaps
package.
Pruning can tapenalty
but for the pruning pass.
Default is penalty
.
update.earth
.earth.formula
: arguments passed to earth.default
. earth.default
: unused, but provided for generic/method consistency.
format.earth
is a term).
Term number 1 is always the intercept.bx
.
Each value corresponds to a selected term.
coefficients[1]
is the intercept.rssVec[length(selected.terms)]
.
See also rssVec
below.1-rss/rss.null
.
R-Squared of the model.
A measure of how well the model fits the training data.gcvVec[length(selected.terms)]
.
See also gcvVec
below.
For details of the GCV calculation, see
equation 30 in Friedman's MARS paper and earth:::get.gcv
.1-gcv/gcv.null
.
An estimate of the predictive power of the model. Unlike rsq
, grsq
can be negative.
A negative grsq
would indicate
a severely over parameterised model --- a model that
would not generalise well
even though it may be a good fit to the training data.
Example of a negative grsq
:
earth(mpg ~ ., data = mtcars, pmethod = "none", trace = 4)
x
.
Each column corresponds to a selected term.
Each row corresponds to a row in in the input matrix x
,
after taking subset
.
See model.matrix.earth
for an example of bx
handling.
Example:(Intercept) h(Girth-12.9) h(12.9-Girth) h(Girth-12.9)*h(...
[1,] 1 0.0 4.6 0
[2,] 1 0.0 4.3 0
[3,] 1 0.0 4.1 0
...selected.terms
.
Note that the terms may not be in pairs, because the forward pass
deletes linearly dependent terms before handing control to the pruning pass.Example:Girth Height (Intercept) 0 0 #no factors in intercept h(Girth-12.9) 1 0 #2nd term uses Girth h(12.9-Girth) -1 0 #3rd term uses Girth h(Girth-12.9)*h(Height-76) 1 1 #4th term uses Girth and Height ...
selected.terms
.
Note that the terms may not be in pairs, because the forward pass
deletes linearly dependent terms before handing control to the pruning pass.Example:Girth Height (Intercept) 0.0 0 #intercept, no cuts h(Girth-12.9) 12.9 0 #2nd term has cut at 12.9 h(12.9-Girth) 12.9 0 #3rd term has cut at 12.9 h(Girth-12.9)*h(Height-76) 12.9 76 #4th term has two cuts ...
cuts
and dirs
.
The first element selected.terms[1]
is always 1, the intercept.rssVec
is nprune
.
The null RSS (i.e. the RSS of an intercept only-model) is rssVec[1]
.
The RSS of the selected model is rssVec[length(selected.terms)]
.prune.terms
.
The length of gcvVec
is nprune
.
The null GCV (i.e. the GCV of an intercept-only model) is gcvVec[1]
.
The GCV of the selected model is gcvVec[length(selected.terms)]
.prune.terms
is the model size
(the model size is the number of terms in the model).
Each row is a vector of term numbers for the best model of that size.
An element is 0 if the term is not in the model, thus prune.terms
is a
lower triangular matrix, with dimensions nprune x nprune
.
The model selected by the pruning pass
is at row length(selected.terms)
.
Example:[1,] 1 0 0 0 0 0 0 #intercept-only model
[2,] 1 2 0 0 0 0 0 #best 2 term model uses terms 1,2.
[3,] 1 2 4 0 0 0 0 #best 3 term model uses terms 1,2,4
[4,] 1 2 9 8 0 0 0
...earth's
ppenalty
argument.earth
.earth.formula
.ozone
data to compare mda::mars with other techniques.
(If you use Faraway's examples with earth instead of mars, use $bx
instead of $x
).
Earth's pruning pass uses leaps
which is based on
techniques in Miller. Faraway Extending the Linear Model with R
Friedman (1991) Multivariate Adaptive Regression Splines (with discussion) Annals of Statistics 19/1, 1--141
Friedman (1993) Fast MARS
Stanford University Department of Statistics, Technical Report 110
Hastie, Tibshirani, and Friedman (2001) The Elements of Statistical Learning
Miller, Alan (1990, 2nd ed. 2002) Subset Selection in Regression
format.earth
,
get.nterms.per.degree
,
get.nused.preds.per.subset
,
mars.to.earth
,
model.matrix.earth
,
ozone1
,
plot.earth.models
,
plot.earth
,
plotmo
,
predict.earth
,
reorder.earth
,
summary.earth
,
update.earth
a <- earth(Volume ~ ., data = trees)
summary(a, digits = 2)
# yields:
# Call:
# earth(formula = Volume ~ ., data = trees)
#
# Expression:
# 23
# + 5.7 * pmax(0, Girth - 13)
# - 2.9 * pmax(0, 13 - Girth)
# + 0.72 * pmax(0, Height - 76)
#
# Number of cases: 31
# Selected 4 of 5 terms, and 2 of 2 predictors
# Number of terms at each degree of interaction: 1 3 (additive model)
# GCV: 11 RSS: 213 GRSq: 0.96 RSq: 0.97
Run the code above in your browser using DataLab