Projection Pursuit Regression
Fit a projection pursuit regression model.
# S3 method for formula ppr(formula, data, weights, subset, na.action, contrasts = NULL, …, model = FALSE)
# S3 method for default ppr(x, y, weights = rep(1, n), ww = rep(1, q), nterms, max.terms = nterms, optlevel = 2, sm.method = c("supsmu", "spline", "gcvspline"), bass = 0, span = 0, df = 5, gcvpen = 1, …)
- a formula specifying one or more numeric response variables and the explanatory variables.
- numeric matrix of explanatory variables. Rows represent observations, and columns represent variables. Missing values are not accepted.
- numeric matrix of response variables. Rows represent observations, and columns represent variables. Missing values are not accepted.
- number of terms to include in the final model.
a data frame (or similar: see
model.frame) from which variables specified in
formulaare preferentially to be taken.
- a vector of weights
w_ifor each case.
a vector of weights for each response, so the fit criterion is
the sum over case
w_i ww_j (y_ij - fit_ij)^2divided by the sum of
- an index vector specifying the cases to be used in the training sample. (NOTE: If given, this argument must be named.)
a function to specify the action to be taken if
NAs are found. The default action is given by
getOption("na.action"). (NOTE: If given, this argument must be named.)
- the contrasts to be used when any factor explanatory variables are coded.
- maximum number of terms to choose from when building the model.
- integer from 0 to 3 which determines the thoroughness of an optimization routine in the SMART program. See the ‘Details’ section.
the method used for smoothing the ridge functions. The default is to
use Friedman's super smoother
supsmu. The alternatives are to use the smoothing spline code underlying
smooth.spline, either with a specified (equivalent) degrees of freedom for each ridge functions, or to allow the smoothness to be chosen by GCV.
Can be abbreviated.
super smoother bass tone control used with automatic span selection
supsmu); the range of values is 0 to 10, with larger values resulting in increased smoothing.
super smoother span control (see
supsmu). The default,
0, results in automatic span selection by local cross validation.
spancan also take a value in
"spline"specifies the smoothness of each ridge term via the requested equivalent degrees of freedom.
"gcvspline"this is the penalty used in the GCV selection for each degree of freedom used.
- arguments to be passed to or from other methods.
- logical. If true, the model frame is returned.
The basic method is given by Friedman (1984), and is essentially the
same code used by S-PLUS's
ppreg. This code is extremely
sensitive to the compiler used. The algorithm first adds up to
max.terms ridge terms one at a
time; it will use less if it is unable to find a term to add that makes
sufficient difference. It then removes the least
important term at each step until
are left. The levels of optimization (argument
differ in how thoroughly the models are refitted during this process.
At level 0 the existing ridge terms are not refitted. At level 1
the projection directions are not refitted, but the ridge
functions and the regression coefficients are.
Levels 2 and 3 refit all the terms and are equivalent for one
response; level 3 is more careful to re-balance the contributions
from each regressor at each step and so is a little less likely to
converge to a saddle point of the sum of squares criterion.
A list with the following components, many of which are for use by the method functions.
max.terms. Will be invalid (and zero) for less than
"gcvspline"the equivalent number of degrees of freedom for each ridge term used.
ysto have unit total weighted sum of squares.
q > 1.
q > 1.
model = TRUE) the model frame.
Friedman, J. H. and Stuetzle, W. (1981) Projection pursuit regression. Journal of the American Statistical Association, 76, 817--823. Friedman, J. H. (1984) SMART User's Guide. Laboratory for Computational Statistics, Stanford University Technical Report No. 1. Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Springer.
require(graphics) # Note: your numerical values may differ attach(rock) area1 <- area/10000; peri1 <- peri/10000 rock.ppr <- ppr(log(perm) ~ area1 + peri1 + shape, data = rock, nterms = 2, max.terms = 5) rock.ppr # Call: # ppr.formula(formula = log(perm) ~ area1 + peri1 + shape, data = rock, # nterms = 2, max.terms = 5) # # Goodness of fit: # 2 terms 3 terms 4 terms 5 terms # 8.737806 5.289517 4.745799 4.490378 summary(rock.ppr) # ..... (same as above) # ..... # # Projection direction vectors: # term 1 term 2 # area1 0.34357179 0.37071027 # peri1 -0.93781471 -0.61923542 # shape 0.04961846 0.69218595 # # Coefficients of ridge terms: # term 1 term 2 # 1.6079271 0.5460971 par(mfrow = c(3,2)) # maybe: , pty = "s") plot(rock.ppr, main = "ppr(log(perm)~ ., nterms=2, max.terms=5)") plot(update(rock.ppr, bass = 5), main = "update(..., bass = 5)") plot(update(rock.ppr, sm.method = "gcv", gcvpen = 2), main = "update(..., sm.method=\"gcv\", gcvpen=2)") cbind(perm = rock$perm, prediction = round(exp(predict(rock.ppr)), 1)) detach()