lqa: Fitting penalized Generalized Linear Models with the LQA algorithm

Description

`lqa' is used to fit penalized generalized linear models, specified by giving a symbolic description of the linear predictor and descriptions of the error distribution and the penalty.

Usage

lqa (x, ...)

lqa.update2 (x, y, family = NULL, penalty = NULL, intercept = TRUE, 
             weights = rep (1, nobs), control = lqa.control (), 
             initial.beta, mustart, eta.new, gamma1 = 1, ...)

## S3 method for class 'formula':
lqa(formula, data = list (), weights = rep (1, nobs), subset, 
            na.action, start = NULL, etastart, mustart, offset, ...)

## S3 method for class 'default':
lqa(x, y, family = gaussian (), penalty = NULL, method = "lqa.update2", 
            weights = rep (1, nobs), start = NULL, 
            etastart = NULL, mustart = NULL, offset = rep (0, nobs), 
            control = lqa.control (), intercept = TRUE, 
            standardize = TRUE, ...)

Arguments

formula

a symbolic description of the model to be fit. The details of model specification are given below.

data

an optional data frame containing the variables in the model. If not found in `data', the variables are taken from `environment(formula)', typically the environment from which `lqa' is called.

family

a description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. (See family

penalty

a description of the penalty to be used in the fitting procedure. This must be a penalty object. See penalty for details on penalty functions.

method

a character string naming the function used to estimate the model. The default value method = lqa.update2 applies the LQA algorithm.

intercept

a logical object whether the model should include an intercept (this is recommended) or not. The default value is TRUE.

standardize

a logical object, whether the regressors should be standardized (this is recommended) or not. The default value is TRUE.

weights

an optional vector of weights to be used in the fitting process.

start

starting values for the parameters in the linear predictor.

etastart

starting values for the linear predictor.

mustart

starting values for the vector of means (response).

gamma1

additional step length parameter used in lqa.update2 to enforce convergence if necessary.

offset

this can be used to specify an a priori known component to be included in the linear predictor during fitting.

control

a list of parameters for controlling the fitting process. See the documentation of lqa.control for details.

na.action

a function which indicates what should happen when the data contain `NA's.

subset

an optional vector specifying a subset of observations to be used in the fitting process.

x, y

Used in `lqa.default': `x' is a design matrix (with additional column of ones if an intercept should be included in the model) of dimension `n * p', and `y' is a vector of observations of length `n'.

initial.beta

optional initial values of beta used in the fitting procedures.

eta.new

optional intial values of predictors used in the fitting procedures.

...

further arguments passed to or from other methods.

Value

lqa returns an object of class lqa which inherits from the classes glm and lm. The generic accessor functions coefficients, fitted.values and residuals can be used to extract various useful features of the object returned by lqa. Note it is highly recommended to include an intercept in the model (e.g. use Intercept = TRUE). If you use Intercept = FALSE in the classical linear model then make sure that your y argument is already centered! Otherwise the model would not be valid. An object of class lqa is a list containing at least the following components:
coefficientsa named vector of unstandardized coefficients.
residualsthe residuals based on the estimated coefficients.
fitted.valuesthe fitted mean values, obtained by transforming the linear predictors by the inverse of the link function.
familythe family object used.
penaltythe penalty object used, indicating which penalty has been used.
linear.predictorsthe linear fit on link scale.
devianceup to a constant, minus twice the maximimized (unpenalized) log-likelihood.
aicAkaike's Information Criterion, minus twice the maximized log-likelihood plus twice the trace of the hat matrix (so assuming that the dispersion is known).
bicBayesian Information Criterion, minus twice the maximized log-likelihood plus log (nobs) times the trace of the hat matrix (so assuming that the dispersion is known).
null.deviancedeviance of the null model (that only includes a constant)
n.iterthe number of iterations until convergence.
best.iterthe number of iterations until AIC reaches its minimum.
weightsdiagonal elements of the weight matrix in GLMs.
prior.weightsthe weights as optionally given as argument.
df.residualthe residual degrees of freedom.
df.nullthe residual degrees of freedom for the null model.
convergeda logical variable. TRUE if the algorithm indeed converged.
mean.xThe vector of means of the regressors.
norm.xThe vector of Euclidean norms of the regressors.
AmatThe quadratically approximated penalty matrix corresponding to the penalty used.
methodThe argument indicating the fitting method.
rankThe trace of the hat matrix.
ythe original response vector used to fit the model.
xthe original regressor matrix (including an intercept if given) used to fit the model.
fit.objthe fitted object as returned from the fitting method (e.g. from lqa.update2).

Details

A typical formula has the form `response ~ terms' where 'response' is the (numeric) response vector and `terms' is a series of terms which specifies a linear predictor for `response'. The use is similar to that of the glm() function. As there, the right hand side of the model formula specifies the form of the linear predictor and hence gives the link function of the mean of the response, rather than the mean of the response directly. Per default an intercept is included in the model. If it should be removed then use formulae of the form `response ~ 0 + terms' or `response ~ terms - 1'. Also lqa takes a family argument, which is used to specify the distribution from the exponential family to use, and the link function that is to go with it. The default value is the canonical link.

Examples

Run this code

set.seed (1111)

n <- 200
p <- 5
X <- matrix (rnorm (n * p), ncol = p)
X[,2] <- X[,1] + rnorm (n, sd = 0.1)
X[,3] <- X[,1] + rnorm (n, sd = 0.1)
true.beta <- c (1, 2, 0, 0, -1)
y <- drop (X %*% true.beta) + rnorm (n)

obj1 <- lqa (y ~ X, family = gaussian (), penalty = lasso (1.5), 
             control = lqa.control ())
obj1$coef


set.seed (4321)

n <- 25
p <- 5
X <- matrix (rnorm (n * p), ncol = p)
X[,2] <- X[,1] + rnorm (n, sd = 0.1)
X[,3] <- X[,1] + rnorm (n, sd = 0.1)
true.beta <- c (1, 2, 0, 0, -1)

family1 <- binomial ()
eta.true <- drop (X %*% true.beta)
mu.true <- family1$linkinv (eta.true)
prob1 <- sum (as.integer (y > 0)) / n
nvec <- 1 : n
y2 <- sapply (mu.true, function (n.vec) {rbinom (1, 1, mu.true)})

obj2 <- lqa (y2 ~ X, family = binomial (), 
             penalty = fused.lasso (c (0.0001, 0.2)))
obj2$coef