msgl: Fit a multinomial sparse group lasso regularization path.

Description

Fit a sequence of multinomial logistic regression models using sparse group lasso, group lasso or lasso. In addition to the standard parameter grouping the algorithm supports further grouping of the features.

Usage

msgl(x, classes,
    sampleWeights = rep(1/length(classes), length(classes)),
    grouping = NULL, groupWeights = NULL,
    parameterWeights = NULL, alpha = 0.5,
    standardize = TRUE, lambda, return = 1:length(lambda),
    intercept = TRUE, sparse.data = is(x, "sparseMatrix"),
    algorithm.config = msgl.standard.config)

Arguments

design matrix, matrix of size $N \times p$.

classes

classes, factor of length $N$.

sampleWeights

sample weights, a vector of length $N$.

grouping

grouping of features, a vector of length $p$. Each element of the vector specifying the group of the feature.

groupWeights

the group weights, a vector of length $m$ (the number of groups). If

groupWeights =
  NULL

default weights will be used. Default weights are 0 for the intercept and $$\sqrt{K\cdot\textrm{number of features in the group}}$$ for all other

parameterWeights

a matrix of size $K \times p$. If parameterWeights = NULL default weights will be used. Default weights are is 0 for the intercept weights and 1 for all other weights.

alpha

the $\alpha$ value 0 for group lasso, 1 for lasso, between 0 and 1 gives a sparse group lasso penalty.

standardize

if TRUE the features are standardize before fitting the model. The model parameters are returned in the original scale.

lambda

the lambda sequence for the regularization path.

return

the indices of lambda values for which to return a the fitted parameters.

intercept

should the model include intercept parameters

sparse.data

if TRUE x will be treated as sparse, if x is a sparse matrix it will be treated as sparse by default.

algorithm.config

the algorithm configuration to be used.

Value

betathe fitted parameters -- a list of length length(lambda) with each entry a matrix of size $K\times (p+1)$ holding the fitted parameters
lossthe values of the loss function
objectivethe values of the objective function (i.e. loss + penalty)
lambdathe lambda values used
classes.truethe true classes used for estimation, this is equal to the classes argument

Details

For a classification problem with $K$ classes and $p$ features (covariates) dived into $m$ groups. This function computes a sequence of minimizers (one for each lambda given in the lambda argument) of $$\hat R(\beta) + \lambda \left( (1-\alpha) \sum_{J=1}^m \gamma_J \|\beta^{(J)}\|_2 + \alpha \sum_{i=1}^{n} \xi_i |\beta_i| \right)$$ where $\hat R$ is the weighted empirical log-likelihood risk of the multinomial regression model. The vector $\beta^{(J)}$ denotes the parameters associated with the $J$'th group of features (default is one covariate per group, hence the default dimension of $\beta^{(J)}$ is $K$). The group weights $\gamma \in [0,\infty)^m$ and parameter weights $\xi \in [0,\infty)^n$ may be explicitly specified.

Examples

Run this code

data(SimData)
x <- sim.data$x
classes <- sim.data$classes
lambda <- msgl.lambda.seq(x, classes, alpha = .5, d = 50, lambda.min = 0.05)
fit <- msgl(x, classes, alpha = .5, lambda = lambda)

# Model 10, i.e. the model corresponding to lambda[10]
models(fit)[[10]]

# The nonzero features of model 10
features(fit)[[10]]

# The nonzero parameters of model 10
parameters(fit)[[10]]

# The training errors of the models.
Err(fit, x)
# Note: For high dimensional models the training errors are almost always over optimistic,
# instead use msgl.cv to estimate the expected errors by cross validation

Run the code above in your browser using DataLab