# ppr

##### Projection Pursuit Regression

Fit a projection pursuit regression model.

- Keywords
- regression

##### Usage

`ppr(x, ...)`## S3 method for class 'formula':
ppr(formula, data, weights, subset, na.action,
contrasts = NULL, \dots, model = FALSE)

## S3 method for class 'default':
ppr(x, y, weights = rep(1, n),
ww = rep(1, q), nterms, max.terms = nterms, optlevel = 2,
sm.method = c("supsmu", "spline", "gcvspline"),
bass = 0, span = 0, df = 5, gcvpen = 1, ...)

##### Arguments

- formula
- a formula specifying one or more numeric response variables and the explanatory variables.
- x
- numeric matrix of explanatory variables. Rows represent observations, and columns represent variables. Missing values are not accepted.
- y
- numeric matrix of response variables. Rows represent observations, and columns represent variables. Missing values are not accepted.
- nterms
- number of terms to include in the final model.
- data
- a data frame (or similar: see
`model.frame`

) from which variables specified in`formula`

are preferentially to be taken. - weights
- a vector of weights
`w_i`

for each*case*. - ww
- a vector of weights for each
*response*, so the fit criterion is the sum over case`i`

and responses`j`

of`w_i ww_j (y_ij - fit_ij)^2`

divided by the sum of`w_i`

. - subset
- an index vector specifying the cases to be used in the training sample. (NOTE: If given, this argument must be named.)
- na.action
- a function to specify the action to be taken if
`NA`

s are found. The default action is given by`getOption("na.action")`

. (NOTE: If given, this argument must be named.) - contrasts
- the contrasts to be used when any factor explanatory variables are coded.
- max.terms
- maximum number of terms to choose from when building the model.
- optlevel
- integer from 0 to 3 which determines the thoroughness of an
optimization routine in the SMART program. See the
Details section. - sm.method
- the method used for smoothing the ridge functions. The default is to
use Friedman's super smoother
`supsmu`

. The alternatives are to use the smoothing spline code underlying`smooth.spline`

, either with a specified (equivalent) degrees of freedom for each ridge functions, or to allow the smoothness to be chosen by GCV.Can be abbreviated.

- bass
- super smoother bass tone control used with automatic span selection
(see
`supsmu`

); the range of values is 0 to 10, with larger values resulting in increased smoothing. - span
- super smoother span control (see
`supsmu`

). The default,`0`

, results in automatic span selection by local cross validation.`span`

can also take a value in`(0, 1]`

. - df
- if
`sm.method`

is`"spline"`

specifies the smoothness of each ridge term via the requested equivalent degrees of freedom. - gcvpen
- if
`sm.method`

is`"gcvspline"`

this is the penalty used in the GCV selection for each degree of freedom used. - ...
- arguments to be passed to or from other methods.
- model
- logical. If true, the model frame is returned.

##### Details

The basic method is given by Friedman (1984), and is essentially the
same code used by S-PLUS's `ppreg`

. This code is extremely
sensitive to the compiler used.

The algorithm first adds up to `max.terms`

ridge terms one at a
time; it will use less if it is unable to find a term to add that makes
sufficient difference. It then removes the least
important term at each step until `nterms`

terms
are left.

The levels of optimization (argument `optlevel`

)
differ in how thoroughly the models are refitted during this process.
At level 0 the existing ridge terms are not refitted. At level 1
the projection directions are not refitted, but the ridge
functions and the regression coefficients are. Levels 2 and 3 refit all the terms and are equivalent for one
response; level 3 is more careful to re-balance the contributions
from each regressor at each step and so is a little less likely to
converge to a saddle point of the sum of squares criterion.

##### Value

- A list with the following components, many of which are for use by the method functions.
call the matched call p the number of explanatory variables (after any coding) q the number of response variables mu the argument `nterms`

ml the argument `max.terms`

gof the overall residual (weighted) sum of squares for the selected model gofn the overall residual (weighted) sum of squares against the number of terms, up to `max.terms`

. Will be invalid (and zero) for less than`nterms`

.df the argument `df`

edf if `sm.method`

is`"spline"`

or`"gcvspline"`

the equivalent number of degrees of freedom for each ridge term used.xnames the names of the explanatory variables ynames the names of the response variables alpha a matrix of the projection directions, with a column for each ridge term beta a matrix of the coefficients applied for each response to the ridge terms: the rows are the responses and the columns the ridge terms yb the weighted means of each response ys the overall scale factor used: internally the responses are divided by `ys`

to have unit total weighted sum of squares.fitted.values the fitted values, as a matrix if `q > 1`

.residuals the residuals, as a matrix if `q > 1`

.smod internal work array, which includes the ridge functions evaluated at the training set points. model (only if `model = TRUE`

) the model frame.

##### source

Friedman (1984): converted to double precision and added interface to
smoothing splines by B. D. Ripley, originally for the

##### References

Friedman, J. H. and Stuetzle, W. (1981)
Projection pursuit regression.
*Journal of the American Statistical Association*,
**76**, 817--823.

Friedman, J. H. (1984)
SMART User's Guide.
Laboratory for Computational Statistics, Stanford University Technical
Report No.

Venables, W. N. and Ripley, B. D. (2002)
*Modern Applied Statistics with S.* Springer.

##### See Also

##### Examples

`library(stats)`

```
require(graphics)
# Note: your numerical values may differ
attach(rock)
area1 <- area/10000; peri1 <- peri/10000
rock.ppr <- ppr(log(perm) ~ area1 + peri1 + shape,
data = rock, nterms = 2, max.terms = 5)
rock.ppr
# Call:
# ppr.formula(formula = log(perm) ~ area1 + peri1 + shape, data = rock,
# nterms = 2, max.terms = 5)
#
# Goodness of fit:
# 2 terms 3 terms 4 terms 5 terms
# 8.737806 5.289517 4.745799 4.490378
summary(rock.ppr)
# ..... (same as above)
# .....
#
# Projection direction vectors:
# term 1 term 2
# area1 0.34357179 0.37071027
# peri1 -0.93781471 -0.61923542
# shape 0.04961846 0.69218595
#
# Coefficients of ridge terms:
# term 1 term 2
# 1.6079271 0.5460971
par(mfrow = c(3,2)) # maybe: , pty = "s")
plot(rock.ppr, main = "ppr(log(perm)~ ., nterms=2, max.terms=5)")
plot(update(rock.ppr, bass = 5), main = "update(..., bass = 5)")
plot(update(rock.ppr, sm.method = "gcv", gcvpen = 2),
main = "update(..., sm.method="gcv", gcvpen=2)")
cbind(perm = rock$perm, prediction = round(exp(predict(rock.ppr)), 1))
detach()
```

*Documentation reproduced from package stats, version 3.3, License: Part of R 3.3*