ppr
Projection Pursuit Regression
Fit a projection pursuit regression model.
- Keywords
- regression
Usage
ppr(x, ...)
"ppr"(formula, data, weights, subset, na.action, contrasts = NULL, ..., model = FALSE)
"ppr"(x, y, weights = rep(1, n), ww = rep(1, q), nterms, max.terms = nterms, optlevel = 2, sm.method = c("supsmu", "spline", "gcvspline"), bass = 0, span = 0, df = 5, gcvpen = 1, ...)
Arguments
- formula
- a formula specifying one or more numeric response variables and the explanatory variables.
- x
- numeric matrix of explanatory variables. Rows represent observations, and columns represent variables. Missing values are not accepted.
- y
- numeric matrix of response variables. Rows represent observations, and columns represent variables. Missing values are not accepted.
- nterms
- number of terms to include in the final model.
- data
-
a data frame (or similar: see
model.frame
) from which variables specified informula
are preferentially to be taken. - weights
- a vector of weights
w_i
for each case. - ww
-
a vector of weights for each response, so the fit criterion is
the sum over case
i
and responsesj
ofw_i ww_j (y_ij - fit_ij)^2
divided by the sum ofw_i
. - subset
- an index vector specifying the cases to be used in the training sample. (NOTE: If given, this argument must be named.)
- na.action
-
a function to specify the action to be taken if
NA
s are found. The default action is given bygetOption("na.action")
. (NOTE: If given, this argument must be named.) - contrasts
- the contrasts to be used when any factor explanatory variables are coded.
- max.terms
- maximum number of terms to choose from when building the model.
- optlevel
- integer from 0 to 3 which determines the thoroughness of an optimization routine in the SMART program. See the Details section.
- sm.method
-
the method used for smoothing the ridge functions. The default is to
use Friedman's super smoother
supsmu
. The alternatives are to use the smoothing spline code underlyingsmooth.spline
, either with a specified (equivalent) degrees of freedom for each ridge functions, or to allow the smoothness to be chosen by GCV.Can be abbreviated.
- bass
-
super smoother bass tone control used with automatic span selection
(see
supsmu
); the range of values is 0 to 10, with larger values resulting in increased smoothing. - span
-
super smoother span control (see
supsmu
). The default,0
, results in automatic span selection by local cross validation.span
can also take a value in(0, 1]
. - df
-
if
sm.method
is"spline"
specifies the smoothness of each ridge term via the requested equivalent degrees of freedom. - gcvpen
-
if
sm.method
is"gcvspline"
this is the penalty used in the GCV selection for each degree of freedom used. - ...
- arguments to be passed to or from other methods.
- model
- logical. If true, the model frame is returned.
Details
The basic method is given by Friedman (1984), and is essentially the
same code used by S-PLUS's ppreg
. This code is extremely
sensitive to the compiler used.
The algorithm first adds up to max.terms
ridge terms one at a
time; it will use less if it is unable to find a term to add that makes
sufficient difference. It then removes the least
important term at each step until nterms
terms
are left.
The levels of optimization (argument optlevel
)
differ in how thoroughly the models are refitted during this process.
At level 0 the existing ridge terms are not refitted. At level 1
the projection directions are not refitted, but the ridge
functions and the regression coefficients are.
Levels 2 and 3 refit all the terms and are equivalent for one response; level 3 is more careful to re-balance the contributions from each regressor at each step and so is a little less likely to converge to a saddle point of the sum of squares criterion.
Value
-
A list with the following components, many of which are for use by the
method functions.
- call
- the matched call
- p
- the number of explanatory variables (after any coding)
- q
- the number of response variables
- mu
- the argument
nterms
- ml
- the argument
max.terms
- gof
- the overall residual (weighted) sum of squares for the selected model
- gofn
- the overall residual (weighted) sum of squares against the
number of terms, up to
max.terms
. Will be invalid (and zero) for less thannterms
. - df
- the argument
df
- edf
- if
sm.method
is"spline"
or"gcvspline"
the equivalent number of degrees of freedom for each ridge term used. - xnames
- the names of the explanatory variables
- ynames
- the names of the response variables
- alpha
- a matrix of the projection directions, with a column for each ridge term
- beta
- a matrix of the coefficients applied for each response to the ridge terms: the rows are the responses and the columns the ridge terms
- yb
- the weighted means of each response
- ys
- the overall scale factor used: internally the responses are
divided by
ys
to have unit total weighted sum of squares. - fitted.values
- the fitted values, as a matrix if
q > 1
. - residuals
- the residuals, as a matrix if
q > 1
. - smod
- internal work array, which includes the ridge functions evaluated at the training set points.
- model
- (only if
model = TRUE
) the model frame.
Source
Friedman (1984): converted to double precision and added interface to smoothing splines by B. D. Ripley, originally for the \href{https://CRAN.R-project.org/package=#1}{\pkg{#1}}MASSMASS package.
References
Friedman, J. H. and Stuetzle, W. (1981) Projection pursuit regression. Journal of the American Statistical Association, 76, 817--823.
Friedman, J. H. (1984) SMART User's Guide. Laboratory for Computational Statistics, Stanford University Technical Report No.\ifelse{latex}{\out{~}}{ } 1.
Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Springer.
See Also
Examples
library(stats)
require(graphics)
# Note: your numerical values may differ
attach(rock)
area1 <- area/10000; peri1 <- peri/10000
rock.ppr <- ppr(log(perm) ~ area1 + peri1 + shape,
data = rock, nterms = 2, max.terms = 5)
rock.ppr
# Call:
# ppr.formula(formula = log(perm) ~ area1 + peri1 + shape, data = rock,
# nterms = 2, max.terms = 5)
#
# Goodness of fit:
# 2 terms 3 terms 4 terms 5 terms
# 8.737806 5.289517 4.745799 4.490378
summary(rock.ppr)
# ..... (same as above)
# .....
#
# Projection direction vectors:
# term 1 term 2
# area1 0.34357179 0.37071027
# peri1 -0.93781471 -0.61923542
# shape 0.04961846 0.69218595
#
# Coefficients of ridge terms:
# term 1 term 2
# 1.6079271 0.5460971
par(mfrow = c(3,2)) # maybe: , pty = "s")
plot(rock.ppr, main = "ppr(log(perm)~ ., nterms=2, max.terms=5)")
plot(update(rock.ppr, bass = 5), main = "update(..., bass = 5)")
plot(update(rock.ppr, sm.method = "gcv", gcvpen = 2),
main = "update(..., sm.method=\"gcv\", gcvpen=2)")
cbind(perm = rock$perm, prediction = round(exp(predict(rock.ppr)), 1))
detach()