fwdglm: Forward Search in Generalized Linear Models

Description

This function applies the forward search approach to robust analysis in generalized linear models.

Usage

fwdglm(formula, family, data, weights, na.action, contrasts = NULL, bsb = NULL, 
       balanced = TRUE, maxit = 50, epsilon = 1e-06, nsamp = 100, trace = TRUE)

Value

The function returns an object of class "fwdglm" with the following components:

call: the matched call.
Residuals: a \((n x (n-p+1))\) matrix of residuals.
Unit: a matrix of units added (to a maximum of 5 units) at each step.
included: a list with each element containing a vector of units included at each step of the forward search.
Coefficients: a \(((n-p+1) x p)\) matrix of coefficients.
tStatistics: a \(((n-p+1) x p)\) matrix of t statistics for the coefficients, i.e. coef.est/SE(coef.est).
Leverage: a \((n x (n-p+1))\) matrix of leverage values.
MaxRes: a \(((n-p) x 2)\) matrix of max deviance residuals in the best subsets and \(m\)-th deviance residuals.
MinDelRes: a \(((n-p-1) x 2)\) matrix of minimum deviance residuals out of best subsets and \((m+1)\)-th deviance residuals.
ScoreTest: a \(((n-p) x 1)\) matrix of score test statistics for a goodness of link test.
Likelihood: a \(((n-p) x 4)\) matrix with columns containing: deviance, residual deviance, psuedo \(R^2\) (computed as \(1-deviance/null.deviance\)), dispersion parameter (computed as \(\sum(pearson.residuals^2)/(m - p)\)).
CookDist: a \(((n-p) x 1)\) matrix of forward Cook's distances.
ModCookDist: a \(((n-p) x 5)\) matrix of forward modified Cook's distances for the units (to a maximum of 5 units) included at each step.
Weights: a \((n x (n-p))\) matrix of weights used at each step of the forward search.
inibsb: a vector giving the best starting subset chosen by lmsglm.
binary.response: logical, equal to TRUE if binary response.

Arguments

formula: a symbolic description of the model to be fit. The details of the model are the same as for glm.
family: a description of the error distribution and link function to be used in the model. See family for details.
data: an optional data frame containing the variables in the model. By default the variables are taken from the environment from which the function is called.
weights: an optional vector of weights to be used in the fitting process.
na.action: a function which indicates what should happen when the data contain NA's. The default is set by the na.action setting of options, and is na.fail if that is unset. The default is na.omit.
contrasts: an optional list. See the contrasts.arg of model.matrix.default.
bsb: an optional vector specifying a starting subset of observations to be used in the forward search. By default the "best" starting subset is chosen using the function lmsglm with control arguments provided by nsamp.
balanced: logical, for a binary response if TRUE the proportion of successes on the full dataset is approximately balanced during the forward search algorithm.
maxit: integer giving the maximal number of IWLS iterations. See glm.control for details.
epsilon: positive convergence tolerance epsilon. See glm.control for details.
nsamp: the initial subset for the forward search in generalized linear models is found by the function lmsglm. This argument allows to control how many subsets are used in the robust fitting procedure. The choices are: the number of samples (100 by the default) or "all". Note that the algorithm tries to find nsamp good subsets or a maximum of 2*nsamp subsets.
trace: logical, if TRUE a message is printed for every ten iterations completed during the forward search.

Author

Originally written for S-Plus by: Kjell Konis kkonis@insightful.com and Marco Riani mriani@unipr.it
Ported to R by Luca Scrucca luca@stat.unipg.it

References

Atkinson, A.C. and Riani, M. (2000), Robust Diagnostic Regression Analysis, First Edition. New York: Springer, Chapter 6.

Examples

Run this code

data(cellular)
cellular$TNF <- as.factor(cellular$TNF)
cellular$IFN <- as.factor(cellular$IFN)
mod <- fwdglm(y ~ TNF + IFN, data=cellular, family=poisson(log), nsamp=200)
summary(mod)
if (FALSE) plot(mod)
plot(mod, 1)
plot(mod, 5)
plot(mod, 6, ylim=c(-3, 20))
plot(mod, 7)
plot(mod, 8)

Run the code above in your browser using DataLab