Function that estimates a group-regularized elastic net model.
gren(x, y, m=rep(1, nrow(x)), unpenalized=NULL, partitions=NULL, alpha=0.5,
lambda=NULL, intercept=TRUE, monotone=NULL, psel=TRUE, compare=TRUE,
posterior=FALSE, nfolds=nrow(x), foldid=NULL, trace=TRUE,
init=list(lambdag=NULL, mu=NULL, sigma=NULL, chi=NULL, ci=NULL),
control=list(epsilon=0.001, maxit=500, maxit.opt=1000, maxit.vb=100))
feature data as either numeric
matrix
or data.frame
of numeric
variables.
response as either a numeric
with binomial/binary successes of length nrow(x)
or a matrix
of nrow(x)
rows and two columns, where the first column contains the binomial/binary failures and the second column the binomial/binary successes.
numeric
of length nrow(x)
that contains the number of Bernoulli trials.
Optional numeric
matrix
or data.frame
of numeric
unpenalized covariates of nrow(x)
rows.
list
that contains the (possibly multiple) partitions of the data. Every list
object corresponds to one partition, where every partition is a numeric
of length ncol(x)
containing the group ids of the features.
proportion of L1 penalty as a numeric
of length 1.
global penalty parameter. The default NULL
will result in estimation by cross-validation.
logical
to indicate whether an intercept should be included.
list
of two logical
vectors of length length(partitions)
. The first one monotone
indicates whether the corresponding partition's penalty parameters should be monotonically estimates, the second vector decreasing
indicates whether the monotone penalty parameters are decreasing with group number.
either a numeric
vector that indicates the number of features to select or a logical
. If TRUE
feature selection is done by letting glmnet
determine the penalty parameter sequence.
logical
, if TRUE
, a regular non-group-regularized model is estimated.
if TRUE
, the full variational Bayes posterior is returned.
numeric
of length 1 with the number of folds used in the cross-validation of the global lambda
. The default is nrow(x)
.
optional numeric
vector of length nrow(x)
with the fold assignments of the observations.
if TRUE
, progress of the algorithm is printed.
optional list
containing the starting values of the iterative algorithm. See Details for more information.
a list
of algorithm control parameters. See Details for more information.
Function returns an S3 list
object of class gren
containing output with the following components:
call
The function call that produced the output.
alpha
proportion of L1 penalty as a numeric
of length 1.
lambda
global penalty parameter as numeric
. Estimated by cross-validation if lambda=NULL
.
lambdag.seq
list
with full sequence of penalty multipliers over iterations.
lambdag
list
with final estimates of penalty multipliers.
vb.post
list
with variational posterior parameters \(mu_j\), \(sigma_{ij}\), \(c_i\), and \(chi_j\).
freq.model
frequentist elastic net model as output of glmnet
call. NULL
if psel=FALSE
.
iter
list
with number of iterations of lambdag
estimation, with number of optimisation iterations of lambdag
, and number of variational Bayes iterations.
conv
list
of logical
s with convergence of lambdag
sequence, optimisation steps, and variational Bayes iterations.
args
list
with input arguments of gren
call.
This is the main function of the package that estimates a group-regularized elastic net regression. The elastic net penalty's proportion of L1-norm penalisation is determined by alpha
. alpha
close to 0 implies more ridge-like penalty, while alpha
close to 1 implies lasso-like penalty. The algorithm is a two-step procedure: first, a global lambda penalty is estimates by cross-validation. Next, the groupwise lambda multipliers are estimates by an EM algorithm. The EM algorithm consists of: i) an expectation step in which the expected marginal likelihood of the penalty multipliers is iteratively approximated by a variational Bayes EM algorithm and ii) a maximisation step in which the approximate expected marginal likelihood is maximised with respect to the penalty multipliers. After convergence of the algorithm an (optional) frequentist elastic net model is fit using the estimated penalty multipliers by setting psel=TRUE
or by setting psel
to a numeric
vector.
The user may speed up the procedure by specifying initial values for the EM algorithm in init
. init
is a list
that contains:
lambdag
initial values for \(\lambda_g\) in a list
of length length(partitions)
.
mu
initial values for the \(\mu_j\) in a numeric
vector of length ncol(x) + ncol(unpenalized) + intercept
.
chi
initial values for the \(\chi_j\) in a numeric
vector of length ncol(x)
.
ci
initial values for the \(c_i\) in a numeric
vector of length nrow(x)
.
sigma
The initial values for the \(\Sigma_{ij}\) in a matrix
of numeric
s with ncol(x)
rows and columns.
control
is a list
with parameters to control the estimation procedure. It consists of the following components:
epsilon
numeric
with the relative convergence tolerance. Default is epsilon=0.001
.
maxit
numeric
with whole number that gives the maximum number of iterations to update the lambdag
. Default is maxit=500
.
maxit.opt
numeric
with whole number that gives the maximum number of iterations to numerically maximise the lambdag
. Maximisation occurs at every iteration. Default is maxit.opt=1000
.
maxit.vb
numeric
with whole number that gives the maximum number of iterations to update the variational parameters mu
, sigma
, chi
, and ci
. One full update sequence per iteration. Default is maxit=100
.
M<U+00FC>nch, M.M., Peeters, C.F.W., van der Vaart, A.W., and van de Wiel, M.A. (2018). Adaptive group-regularized logistic elastic net regression. arXiv:1805.00389v1 [stat.ME].
# NOT RUN {
## Create data
p <- 1000
n <- 100
set.seed(2018)
x <- matrix(rnorm(n*p), ncol=p, nrow=n)
beta <- c(rnorm(p/2, 0, 0.1), rnorm(p/2, 0, 1))
m <- rep(1, n)
y <- rbinom(n, m, as.numeric(1/(1 + exp(-x %*% as.matrix(beta)))))
partitions <- list(groups=rep(c(1, 2), each=p/2))
## estimate model
fit.gren <- gren(x, y, m, partitions=partitions)
# }
Run the code above in your browser using DataLab