gamlasso allows for specifying models in two ways:
1) with the the formula approach, and 2) with the term specification approach.
The formula approach is appropriate for when the user wants an L1-penalty on the
linear terms of the model, in which case the user is required to specify the linear terms
in a model matrix named "X" appended to the input data frame. A typical formula specification
would be "y ~ X + s(z) + ..." where "X" corresponds to the model-matrix of
linear terms subject to an L1-penalty, while everything to the right of "X" is
considered part of the gam formula (i.e. all smooth terms). In light of the above formula,
gamlasso iterates (until convergence) between the following two lines of pseudo code:
model.cv.glmnet <- cv.glmnet(y=y, x=X, offset="model.gam fitted values")
model.gam <- gam(y ~ s(z) + ..., offset="model.cv.glmnet fitted values")
The term specification approach can fit the same type of models as the formula approach
(i.e. models with L1-penalty on the linear terms). However, it is more flexible in terms
of penalty-structure and can be useful if the user has big data sets with lots of variables
making the formula specification cumbersome. In the term specification approach
the user simply specifies the names of the data columns corresponding to the
response, linear.terms and smooth.terms and then specifies
whether to put a linear.penalty="l1", "l2" or "none"
(on linear.terms) and whether to put a smooth.penalty="l1" or
"l2" (on smooth.terms).
While fitting a binomial model for binary responses (0/1) include the response
variable before "~" if using the formula approach or when using the term-
specification approach the response argument will be a single variable name.
In general if the responses are success/failure counts then the formula should
start with something similar to cbind(success,failure) ~ ... and for
using the term-specification approach the response argument should be a
vector of length two giving the success and failure variable names.
If family="cox" then the weights argument must be provided
and should correspond to a status variable (1-censor). For other models
it should correspond to a custom weights variables to be used for the
weighted log-likelihood, for example the total counts for fitting a
binomial model. (weights for families other than "cox" currently not
implemented)
Both the formula and term-specification approaches can fit interaction models as
well. There are three kinds of interactions - those between two linear predictors,
between two smooth predictors and between linear and smooth predictors. For the
formula approach the first type of interaction must be included as additional
columns in the "X" matrix and the other two types must be mentioned in the
smooth terms part of the formula. For the term-specification approach the argument
interaction must be TRUE in which case all the pairwise
interactions are used as predictors and variable selection is done on all of them.