gren: Group-regularized logistic elastic net regression

Description

Function that estimates a group-regularized elastic net model.

Usage

gren(x, y, m=rep(1, nrow(x)), unpenalized=NULL, partitions=NULL, alpha=0.5, 
     lambda=NULL, intercept=TRUE, monotone=NULL, psel=TRUE, compare=TRUE, 
     posterior=FALSE, nfolds=nrow(x), foldid=NULL, trace=TRUE,
     init=list(lambdag=NULL, mu=NULL, sigma=NULL, chi=NULL, ci=NULL),
     control=list(epsilon=0.001, maxit=500, maxit.opt=1000, maxit.vb=100))

Arguments

feature data as either numeric matrix or data.frame of numeric variables.

response as either a numeric with binomial/binary successes of length nrow(x) or a matrix of nrow(x) rows and two columns, where the first column contains the binomial/binary failures and the second column the binomial/binary successes.

numeric of length nrow(x) that contains the number of Bernoulli trials.

unpenalized

Optional numeric matrix or data.frame of numeric unpenalized covariates of nrow(x) rows.

partitions

list that contains the (possibly multiple) partitions of the data. Every list object corresponds to one partition, where every partition is a numeric of length ncol(x) containing the group ids of the features.

alpha

proportion of L1 penalty as a numeric of length 1.

lambda

global penalty parameter. The default NULL will result in estimation by cross-validation.

intercept

logical to indicate whether an intercept should be included.

monotone

list of two logical vectors of length length(partitions). The first one monotone indicates whether the corresponding partition's penalty parameters should be monotonically estimates, the second vector decreasing indicates whether the monotone penalty parameters are decreasing with group number.

psel

either a numeric vector that indicates the number of features to select or a logical. If TRUE feature selection is done by letting glmnet determine the penalty parameter sequence.

compare

logical, if TRUE, a regular non-group-regularized model is estimated.

posterior

if TRUE, the full variational Bayes posterior is returned.

nfolds

numeric of length 1 with the number of folds used in the cross-validation of the global lambda. The default is nrow(x).

foldid

optional numeric vector of length nrow(x) with the fold assignments of the observations.

trace

if TRUE, progress of the algorithm is printed.

init

optional list containing the starting values of the iterative algorithm. See Details for more information.

control

a list of algorithm control parameters. See Details for more information.

Value

Function returns an S3 list object of class gren containing output with the following components:

call

The function call that produced the output.

alpha

proportion of L1 penalty as a numeric of length 1.

lambda

global penalty parameter as numeric. Estimated by cross-validation if lambda=NULL.

lambdag.seq

list with full sequence of penalty multipliers over iterations.

lambdag

list with final estimates of penalty multipliers.

vb.post

list with variational posterior parameters \(mu_j\), \(sigma_{ij}\), \(c_i\), and \(chi_j\).

freq.model

frequentist elastic net model as output of glmnet call. NULL if psel=FALSE.

iter

list with number of iterations of lambdag estimation, with number of optimisation iterations of lambdag, and number of variational Bayes iterations.

conv

list of logicals with convergence of lambdag sequence, optimisation steps, and variational Bayes iterations.

args

list with input arguments of gren call.

Details

This is the main function of the package that estimates a group-regularized elastic net regression. The elastic net penalty's proportion of L1-norm penalisation is determined by alpha. alpha close to 0 implies more ridge-like penalty, while alpha close to 1 implies lasso-like penalty. The algorithm is a two-step procedure: first, a global lambda penalty is estimates by cross-validation. Next, the groupwise lambda multipliers are estimates by an EM algorithm. The EM algorithm consists of: i) an expectation step in which the expected marginal likelihood of the penalty multipliers is iteratively approximated by a variational Bayes EM algorithm and ii) a maximisation step in which the approximate expected marginal likelihood is maximised with respect to the penalty multipliers. After convergence of the algorithm an (optional) frequentist elastic net model is fit using the estimated penalty multipliers by setting psel=TRUE or by setting psel to a numeric vector.

The user may speed up the procedure by specifying initial values for the EM algorithm in init. init is a list that contains:

lambdag: initial values for \(\lambda_g\) in a list of length length(partitions).
mu: initial values for the \(\mu_j\) in a numeric vector of length ncol(x) + ncol(unpenalized) + intercept.
chi: initial values for the \(\chi_j\) in a numeric vector of length ncol(x).
ci: initial values for the \(c_i\) in a numeric vector of length nrow(x).
sigma: The initial values for the \(\Sigma_{ij}\) in a matrix of numerics with ncol(x) rows and columns.

control is a list with parameters to control the estimation procedure. It consists of the following components:

epsilon: numeric with the relative convergence tolerance. Default is epsilon=0.001.
maxit: numeric with whole number that gives the maximum number of iterations to update the lambdag. Default is maxit=500.
maxit.opt: numeric with whole number that gives the maximum number of iterations to numerically maximise the lambdag. Maximisation occurs at every iteration. Default is maxit.opt=1000.
maxit.vb: numeric with whole number that gives the maximum number of iterations to update the variational parameters mu, sigma, chi, and ci. One full update sequence per iteration. Default is maxit=100.

References

M<U+00FC>nch, M.M., Peeters, C.F.W., van der Vaart, A.W., and van de Wiel, M.A. (2018). Adaptive group-regularized logistic elastic net regression. arXiv:1805.00389v1 [stat.ME].

Examples

Run this code

# NOT RUN {
## Create data
p <- 1000
n <- 100
set.seed(2018)
x <- matrix(rnorm(n*p), ncol=p, nrow=n)
beta <- c(rnorm(p/2, 0, 0.1), rnorm(p/2, 0, 1))
m <- rep(1, n)
y <- rbinom(n, m, as.numeric(1/(1 + exp(-x %*% as.matrix(beta)))))
partitions <- list(groups=rep(c(1, 2), each=p/2))

## estimate model
fit.gren <- gren(x, y, m, partitions=partitions)
# }

Run the code above in your browser using DataLab