gamlasso: Fitting a gamlasso model

Description

This function will fit a gamlasso model with the given penalties. For some special cases using gam or glmnet might be more efficient and/or flexible

Usage

# S3 method for formula
gamlasso(
  formula,
  data,
  family = "gaussian",
  linear.penalty = "l1",
  smooth.penalty = "l2",
  num.knots = 5,
  offset = NULL,
  weights = NULL,
  interactions = F,
  seed = .Random.seed[1],
  num.iter = 100,
  tolerance = 1e-04,
  ...
)
# S3 method for default
gamlasso(
  response,
  linear.terms,
  smooth.terms,
  data,
  family = "gaussian",
  linear.penalty = "l1",
  smooth.penalty = "l2",
  num.knots = 5,
  offset = NULL,
  weights = NULL,
  interactions = F,
  seed = .Random.seed[1],
  num.iter = 100,
  tolerance = 1e-04,
  prompts = F,
  verbose = T,
  ...
)

Arguments

formula

A formula describing the model to be fitted

response

The name of the response variable. Could be two variables in case of a general binomial fit (see details below)

linear.terms

The names of the variables to be used as linear predictors

smooth.terms

The names of the variables to be used as smoothers

data

The data with which to fit the model

family

The family describing the error distribution and link function to be used in the model. A character string which can only be "gaussian" (default), "binomial", "poisson" or "cox". For family = "binomial", response can be a vector of two and for family="cox", weights must be provided (see details below).

linear.penalty

The penalty used on the linear predictors. A character string which can be "none" (default), "l1" or "l2". If "l1" is used then we use the gam and lasso loop. Otherwise only a gam model is fitted (with penalities on parametric terms if linear.penalty = "l2" ).

smooth.penalty

The penalty used on the smoothers. A character string which can be "l1" or "l2" (default). "l2" refers to the inherent second order penalty smoothers have for controlling their shape, so "none" is not an option. For "l1" basis is specified by bs='ts', else bs='tp' is used. (see gam for details on basis types)

num.knots

Number of knots for each smoothers. Can be a single integer (recycled for each smoother variable) or a vector of integers the same length as the number of smoothers.

offset

The name of the offset variable. NULL (default) if not provided

weights

The name of the weights variable. NULL (default) if not provided. See details below.

interactions

logical. Should interactions be included as covariates. If TRUE then the smoothers are fitted with ti instead of s so that the added effects of the interactions can be quantified separately.

seed

The random seed can be specified for reproducibility. This is used for fitting the gam and lasso models, or fixed before each loop of gamlasso.

num.iter

Number of iterations for the gamlasso loop

tolerance

Tolerance for covergence of the gamlasso loop

prompts

logical. Should gamlassoChecks provide interactive user prompts for corrective action when needed.

verbose

logical. Should there be "progress reports" printed to the console while fitting the model.

...

Additional arguments

Value

If the arguments fail the basic checking by gamlassoChecks then returns NULL. Else the function calls gamlassoFit which returns a list of two models, gam and cv.glmnet. Either of these could be NULL but if both are non-null then convergence, a matrix of values determining the convergence of the gamlasso loop is also returned. gamlassoFit also returns inherit, a list of select arguments used to fit the gamlasso model and some more values needed for prediction.

Details

gamlasso allows for specifying models in two ways: 1) with the the formula approach, and 2) with the term specification approach.

The formula approach is appropriate for when the user wants an L1-penalty on the linear terms of the model, in which case the user is required to specify the linear terms in a model matrix named "X" appended to the input data frame. A typical formula specification would be "y ~ X + s(z) + ..." where "X" corresponds to the model-matrix of linear terms subject to an L1-penalty, while everything to the right of "X" is considered part of the gam formula (i.e. all smooth terms). In light of the above formula, gamlasso iterates (until convergence) between the following two lines of pseudo code:

model.cv.glmnet <- cv.glmnet(y=y, x=X, offset="model.gam fitted values")
model.gam <- gam(y ~ s(z) + ..., offset="model.cv.glmnet fitted values")

The term specification approach can fit the same type of models as the formula approach (i.e. models with L1-penalty on the linear terms). However, it is more flexible in terms of penalty-structure and can be useful if the user has big data sets with lots of variables making the formula specification cumbersome. In the term specification approach the user simply specifies the names of the data columns corresponding to the response, linear.terms and smooth.terms and then specifies whether to put a linear.penalty="l1", "l2" or "none" (on linear.terms) and whether to put a smooth.penalty="l1" or "l2" (on smooth.terms).

While fitting a binomial model for binary responses (0/1) include the response variable before "~" if using the formula approach or when using the term- specification approach the response argument will be a single variable name. In general if the responses are success/failure counts then the formula should start with something similar to cbind(success,failure) ~ ... and for using the term-specification approach the response argument should be a vector of length two giving the success and failure variable names.

If family="cox" then the weights argument must be provided and should correspond to a status variable (1-censor). For other models it should correspond to a custom weights variables to be used for the weighted log-likelihood, for example the total counts for fitting a binomial model. (weights for families other than "cox" currently not implemented)

Both the formula and term-specification approaches can fit interaction models as well. There are three kinds of interactions - those between two linear predictors, between two smooth predictors and between linear and smooth predictors. For the formula approach the first type of interaction must be included as additional columns in the "X" matrix and the other two types must be mentioned in the smooth terms part of the formula. For the term-specification approach the argument interaction must be TRUE in which case all the pairwise interactions are used as predictors and variable selection is done on all of them.

Examples

Run this code

# NOT RUN {
library(plsmselect)

data(simData)

## Fit gaussian gamlasso model using the formula approach:
## (L1-penalty both on model matrix (X) and smooth terms (bs="ts"))
simData$X = model.matrix(~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10, data=simData)[,-1]

gfit = gamlasso(Yg ~ X +
                   s(z1, k=5, bs="ts") +
                   s(z2, k=5, bs="ts") +
                   s(z3, k=5, bs="ts") +
                   s(z4, k=5, bs="ts"),
                   data = simData,
                   seed=1)

# }
# NOT RUN {
## Equivalently with term specification approach:
gfit = gamlasso(response="Yg",
                  linear.terms=paste0("x",1:10),
                  smooth.terms=paste0("z",1:4),
                  data=simData,
                  linear.penalty = "l1",
                  smooth.penalty = "l1",
                  num.knots = 5,
                  seed=1)
# }
# NOT RUN {
## The two main components of gfit are
## gfit$cv.glmnet (LASSO component) and gfit$gam (GAM components):

## Extract lasso estimates of linear terms:
coef(gfit$cv.glmnet, s="lambda.min")

## Plot the estimates of the smooth effects:
plot(gfit$gam, pages=1)

# See ?summary.gamlasso for an example fitting a binomial response model
# See ?predict.gamlasso for an example fitting a poisson response model
# See ?cumbasehaz for an example fitting a survival response model
# }