"LM"
Linear Modelling,
"GLM"
Generalized Linear Modelling,
"GAM"
Generalized Additive Modelling,
"PPR"
Projection Pursuit Regression,
"MARS"
Multivariate Adaptive Regression Splines,
"POLYMARS"
Polytochomous MARS, and
"NNET"
Feedforward Neural Network Modelling. }
Available methods are:
predict
Predict method for objects of class 'fARMA',
print
Print method for objects of class 'fARMA',
plot
Plot method for objects of class 'fARMA',
summary
Summary method for objects of class 'fARMA',
fitted.values
Fitted values method for objects of class 'fARMA',
residuals
Residuals method for objects of class 'fARMA'. }
The print method prints the returned object from a regression
fit, and the summary method performs a diagnostic analysis and
summarizes the results of the fit in a detailed form. The plot
method produces diagostic plots. The predict method forecasts
from new data records. Two other methods to print the fitted
values, and the residuals are available.
Furthermore, a S-Plus Finmetrics like ordinary least square 'OLS'
function has been added including S3 print, plot and summary methods.
OLS
Predict method for objects of class 'fARMA',
print
Print method for objects of class 'fARMA',
plot
Plot method for objects of class 'fARMA',
summary
Summary method for objects of class 'fARMA'. }regSim(model = c("LM3", "LOGIT3", "GAM3"), n = 100)regFit(formula, data, use = c("lm", "rlm", "am", "ppr", "mars", "nnet",
"polymars"), title = NULL, description = NULL, ...)
gregFit(formula, family, data, use = c("glm", "gam"),
title = NULL, description = NULL, ...)
## S3 method for class 'fREG':
predict(object, newdata, se.fit = FALSE, type = "response", \dots)
show.fREG(object)
## S3 method for class 'fREG':
plot(x, \dots)
## S3 method for class 'fREG':
summary(object, \dots)
## S3 method for class 'fREG':
coef(object, \dots)
## S3 method for class 'fREG':
fitted(object, \dots)
## S3 method for class 'fREG':
residuals(object, \dots)
## S3 method for class 'fREG':
vcov(object, \dots)
OLS(formula, data, ...)
## S3 method for class 'OLS':
print(x, \dots)
## S3 method for class 'OLS':
plot(x, \dots)
## S3 method for class 'OLS':
summary(object, \dots)
data
is the data frame containing the variables in the
model. By default the variables are taken from
environment(formula)
, typically the environment from
which LM
is called. newdata<
glm
predictor has the form response ~ terms
where response
is the (numeric) response vector and terms
is a series of termethod
must be one of the strings in the default argument.
"LM"
, for linear regression models,
"GLM"
for generalized linear"LM2"
, "LOGIT2"
, or
"GAM2"
.regFit
and serves as input for the predict
, print
,
summary
, print.summary
, and plot
methods.
"fREG"
, with the folliwing
slots:fit$parameters
- the fitted model parameters,
fit$residuals
- the model residuals,
fit$fitted.values
- the fitted values of the model,
and many more. For details we refer to the help pages of
the selected regression model.print
method gives information at
least about the function call, the fitted model parameters,
and the residuals variance.
The plot
method produces three figures, the first plots
the series of residuals, the second does a quantile-quantile plot
of the residual plot, and the third plots the fitted values vs.
the residuals. Additional plots can be generated from the plot
method (if available) of the underlying model, see the example below.
The summary
method provides additional information,
like errors on the model parameters as far as available, and adds
additional information about the fit.
The predict
method forecasts from a fitted model. The
returned values are the same as produced by the prediction
function of the selected regression model. Especially, $fit
returns the forecast vector.
The residuals
and fitted.values
methods return
the residuals and the fitted values as numeric vectors.
Function OLS:
returns an S3 object of class "OLS"
that represents an
ordinary least squares fit. The list has the same elements like
an object of class "lm"
, and additionally the elements
$call
, $formula
and $data
.plot.lm
provides four plots: a plot of residuals
against fitted values, a Scale-Location plot of sqrt{| residuals |}
against fitted values, a normal QQ plot, and a plot of Cook's
distances versus row labels.
[stats:lm]
GLM -- Generalized Linear Models:
Generalized linear modelling extends the linear model in two directions.
(i) with a monotonic differentiable link function describing how the
expected values are related to the linear predictor, and (ii) with
response variables having a probability distribution from an exponential
family.
[stats:glm]
GAM -- Generalized Additive Models:
An additive model generalizes a linear model by smoothing individually
each predictor term. A generalized additive model extends the additive
model in the same spirit as the generalized liner amodel extends the
linear model, namely for allowing a link function and for allowing
non-normal distributions from the exponential family.
[mgcv:gam]
PPR -- Projection Pursuit Regression:
The basic method is given by Friedman (1984), and is essentially
the same code used by S-PLUS's ppreg
. It is observed that
this code is extremely sensitive to the compiler used. The algorithm
first adds up to max.terms
, by default ppr.nterms
,
ridge terms one at a time; it will use less if it is unable to find
a term to add that makes sufficient difference. The levels of
optimization (argument optlevel
), by default 2, differ in
how thoroughly the models are refitted during this process.
At level 0 the existing ridge terms are not refitted. At level 1
the projection directions are not refitted, but the ridge
functions and the regression coefficients are. Levels 2 and 3 refit
all the terms; level 3 is more careful to re-balance the contributions
from each regressor at each step and so is a little less likely to
converge to a saddle point of the sum of squares criterion. The
plot
method plots Ridge functions for the projection pursuit
regression fit.
[stats:ppr]
MARS -- Multivariate Adaptive Regression Splines:
This function was coded from scratch, and did not use any of
Friedman's mars code. It gives quite similar results to Friedman's
program in our tests, but not exactly the same results. We have not
implemented Friedman's anova decomposition nor are categorical
predictors handled properly yet. Our version does handle multiple
response variables, however. As it is not well-tested, we would like
to hear of any bugs.
Additional arguments which can be passed to the "mars"
estimator are:
w
- an optional vector of observation weights.
wp
- an optional vector of response weights.
degree
- an optional integer specifying maximum interaction
degree, default is 1.
nk
- an optional integer specifying the maximum number of model
terms.
penalty
- an optional value specifying the cost per degree of
freedom charge, default is 2.
thresh
- an optional value specifying forward stepwise stopping
threshold, default is 0.001.
prune
- an optional logical value specifying whether the model
should be pruned in a backward stepwise fashion, default is TRUE
.
trace.mars
- an optional logical value specifying whether info
should be printed along the way, default is FALSE
.
forward.step
- an optional logical value specifying whether
forward stepwise process should be carried out, default is TRUE
.
prevfit
- optional data structure from previous fit. To see the
effect of changing the penalty paramater, one can use prevfit with
forward.step = FALSE
.
[mda:mars]
POLYMARS -- Polytochomous MARS:
The algorithm employed by polymars
is different from the
MARS(tm) algorithm of Friedman (1991), though it has many similarities.
Also the name polymars
has been used for this algorithm well
before MARS was trademarked.
Additional arguments which can be passed to the "polymars"
estimator are:
maxsize
- the maximum number of basis functions that the model is
allowed to grow to in the stepwise addition procedure. Default is
$\min(6*(n^{1/3}),n/4,100)$, where n
is the number of
observations.
gcv
- parameter used to find the overall best model from a
sequence of fitted models. The residual sum of squares of a model
is penalized by dividing by the square of
1-(gcv x model size)/cases
.
A larger gcv value would tend to produce a smaller model.
additive
- Should the fitted model be additive in the predictors?
startmodel
- the first model that is to be fit by polymars
.
It is either an object of the class polymars
or a model
dreamed up by the user. In that case, it takes the form of a
4 x n
matrix, where n
is the number of basis
functions in the starting model excluding the intercept. Each
row corresponds to one basis function (with two possible components).
Column 1 is the index of the first predictor involved. Column 2 is
a possible knot in this predictor. If column 2 is NA
, the
first component is linear. Column 3 is the possible second predictor
involved (if column 3 is NA
the basis function only depends
on one predictor). Column 4 contains the possible knot for the
predictor in column 3, and it is NA
when this component is
linear. Example: if a row reads 3 NA 2 4.7
, the corresponding
basis function is $[X_3 * (X_2-4.7)_+]$; if a row reads
2 4.3 NA NA
the corresponding basis function is
$[(X_2-4.3)_+]$.
A fifth column can be added with 1s and 0s, The 1s specify which
basis functions of the startmodel must be in each model. Thus, these
functions stay in the model during the whole stepwise fitting
procedure. If startmodel
is not specified polymars
starts with a model that only contains the intercept.
weights
- optional vector of observation weights; if supplied,
the algorithm fits to minimize the sum of the weights multiplied
by the squared residuals. The length of weights must be the same
as the number of observations. The weights must be nonnegative.
no.interact
- an optional matrix used if certain predictor
interactions are not allowed in the model. It is given as a
matrix of size 2 x m
, with predictor indices as entries.
The two predictors of any row cannot have interaction terms with
each other.
knots
- defines how the function is to find potential knots
for the spline basis functions. This can be set to the maximum
number of knots you would like to be considered for each predictor.
Usually, to avoid the design matrix becoming singular the actual
number of knots produced is constrained to at most every third
order statistic in any predictor. This constraint can be adjusted
using the knot.space
argument. It can also be a vector with
the number of potential knots for each predictor. Again the actual
number of knots produced is constrained to be at most every
third order statistic any predictor.
A third possibility is to provide a matrix where each columns
corresponds to the ordered knots you would like to have considered
for that predictor.
This matrix should be filled out to a rectangular data structure
with NAs.
The default is min(20, round(n/4))
knots per predictor.
When specifying knots as a vector an entry of -1
indicates
that the predictor is a categorical variable and each unique entry
in it's column is treated as a level.
When specifying knots as a single number or a matrix and there are
categorical variables these are specified separately as such using
the factor argument.
knot.space
- is an integer describing the minimum number of
order statistics apart that two knots can be. Knots should not
be too close to insure numerical stability.
ts.resp
- testset responses for model selection. Should have
the same number of columns as the training set response. A testset
can be used for the model selection. Depending on the value of
classify, either the model with the smallest testset residual
sum of squares or the smallest testset classification error is
provided. Overrides gcv
.
ts.pred
- testset predictors. Should have the same number of
columns as the training set predictors.
ts.weights
-
testset observation weights. A vector of length equal to the number
of cases of the testset. All weights must be non-negative.
classify
- when the response is discrete (categorical), polymars
can be used for classification. In particular, when
classify = TRUE
, a discrete response with K
levels
is replaced by K
indicator variables as response. Model
selection is still being carried out using gcv, except when a
testset is provided, in which case testset misclassification is
used to select the best model.
factors
- used to indicate that certain variables in the predictor
set are categorical variables. Specified as a vector containing the
appropriate predictor indices (column numbers of categorical
variables in predictors matrix). Factors can also be set when the
knots
argument is given as a vector, with -1
as
the appropriate entries for factors.
tolerance
- for each possible candidate to be added/deleted
the resulting residual sums of squares of the model, with/without
this candidate, must be calculated. The inversion of of
the "X-transpose by X" matrix, X being the design matrix,
is done by an updating procedure c.f. C.R. Rao - Linear
Statistical Inference and Its Applications, 2nd. edition, page 33.
In the inversion the size of the bottom right-hand entry of this
matrix is critical. If its value is near zero or the value
of it
s inverse is almost zero then the inversion procedure
becomes somewhat inaccurate. The lower the tolerance value the
more careful the procedure is in selecting candidates for addition
to the model but it may exclude too conservatively. And the other
hand if the tolerance is set too high a spurious result with a
singular or otherwise sub-optimal model may occur. By default
tolerance is set to 1.0e-5.
verbose
- when set to TRUE
, the function will print
out a line for each addition or deletion stage. For
example, " + 8 : 5 3.25 2 NA" means adding interaction basis
function of predictor 5 with knot at 3.25 and predictor 2 (linear),
to make a model of size 8, including intercept.
[polyclass:polymars]
NNET -- Feedforward Neural Network Regression:
If the response in formula
is a factor, an appropriate
classification network is constructed; this has one output and
entropy fit if the number of levels is two, and a number of
outputs equal to the number of classes and a softmax output
stage for more levels. If the response is not a factor, it is
passed on unchanged to nnet.default
. A quasi-Newton
optimizer is used, written in C
.
[nnet:nnet]
OLS -- Ordinary Least Square Fit:
This function was introduced to mimc the Finmetrics S-Plus
function OLS
. The function wraps R's "lm"
.
Currently it does not support the full functionality of
Finmetrics' OLS
function.Draper N.R., Smith H. (1981); Applied Regression Analysis; Wiley, New York.
Friedman, J.H. (1991); Multivariate Adaptive Regression Splines (with discussion), The Annals of Statistics 19, 1--141. Friedman J.H., and Stuetzle W. (1981); Projection Pursuit Regression; Journal of the American Statistical Association 76, 817-823.
Friedman J.H. (1984); SMART User's Guide; Laboratory for Computational Statistics, Stanford University Technical Report No. 1. Green, Silverman (1994); Nonparametric Regression and Generalized Linear Models; Chapman and Hall.
Gu, Wahba (1991); Minimizing GCV/GML Scores with Multiple Smoothing Parameters via the Newton Method; SIAM J. Sci. Statist. Comput. 12, 383-398.
Hastie T., Tibshirani R. (1990); Generalized Additive Models; Chapman and Hall, London.
Kooperberg Ch., Bose S., and Stone C.J. (1997); Polychotomous Regression, Journal of the American Statistical Association 92, 117--127.
McCullagh P., Nelder, J.A. (1989); Generalized Linear Models; Chapman and Hall, London.
Myers R.H. (1986); Classical and Modern Regression with Applications; Duxbury, Boston.
Rousseeuw P.J., Leroy, A. (1987); Robust Regression and Outlier Detection; Wiley, New York.
Seber G.A.F. (1977); Linear Regression Analysis; Wiley, New York.
Stone C.J., Hansen M., Kooperberg Ch., and Truong Y.K. (1997); The use of polynomial splines and their tensor products in extended linear modeling (with discussion).
Venables, W.N., Ripley, B.D. (1999); Modern Applied Statistics with S-PLUS; Springer, New York. Wahba (1990); Spline Models of Observational Data; SIAM.
Weisberg S. (1985); Applied Linear Regression; Wiley, New York. Wood (2000); Modelling and Smoothing Parameter Estimation with Multiple Quadratic Penalties; JRSSB 62, 413-428.
Wood (2001); mgcv: GAMs and Generalized Ridge Regression for R. R News 1, 20-25.
Wood (2001); Thin Plate Regression Splines.
There exists a vast literature on regression. The references listed above are just a small sample of what is available. The book by Myers' is an introductory text book that covers discussions of much of the recent advances in regression technology. Seber's book is at a higher mathematical level and covers much of the classical theory of least squares.
## regFit -
data(recession)
recession[,1] = paste(recession[,1], "28", sep = "")
## myPlot -
myPlot = function(recession, in.sample) {
recession = as.timeSeries(recession)[, "recession"]
in.sample = as.timeSeries(recession)[, "recession"]
Date = recession[, "date"]
Date = trunc(Date/100) + (Date-100*trunc(Date/100))/12
Recession = recession[, "recession"]
inSample = as.vector(in.sample)
plot(Date, Recession, type = "n", main = "US Recession")
grid()
lines(Date, Recession, type = "h", col = "steelblue")
lines(Date, inSample)
}
## Generalized Additive Modelling:
require(mgcv)
par(mfrow = c(2, 2))
fit = gregFit(formula = recession ~ s(tbills3m) + s(tbonds10y),
family = gaussian(), data = recession, use = "gam")
# In Sample Prediction:
in.sample = predict(fit, newdata = recession)$fit
myPlot(recession, in.sample)
# Summary:
summary(fit)
# Add plots from the original plot method:
gam.fit = fit@fit
class(gam.fit) = "gam"
plot(gam.fit)
Run the code above in your browser using DataLab