lmmlasso: Function to fit high-dimensional (gaussian) linear mixed-effects models

Description

Fits the solution for a high-dimensional (gaussian) linear mixed-effects models

Usage

lmmlasso(x, ...)
"lmmlasso"(x, y, z = x, grp, weights = NULL,coefInit=NULL, lambda, startValue = 1, nonpen = 1:dim(z)[[2]], pdMat = c("pdIdent", "pdDiag","pdSym"), method = "ML", CovOpt = c("nlminb","optimize"), stopSat = TRUE, standardize = TRUE, control = lmmlassoControl(),ranInd=1:dim(z)[[2]],...)

Arguments

matrix of dimension ntot x p including the fixed-effects covariables. An intercept has to be included in the first column as (1,...,1).

response variable of length ntot.

random effects matrix of dimension ntot x q. It has to be a matrix, even if q=1.

grp

grouping variable of length ntot

weights

weights for the fixed-effects covariates: NA means no penalization, 0 means drop this covariate ; if given, the argument nonpen is ignored. By default each covariate has weight 1

coefInit

list with three components used as starting values for the fixed effects, the random effects variance components and the error standard deviation.

lambda

positive regularization parameter

startValue

Choice of the starting values for the fixed effects using linear regression. 1 means 10-fold cross-validation with L1-penalty, 2 means 10-fold cross-validation Ridge Regression and 0 means that all the covariates are set to zero and the intercept is the mean of the response variable

nonpen

index of fixed effects with no penalization, ignored if the argument weights is specified, default is 1, which means that only the intercept (the first column in X )is not penalized.

pdMat

Covariance structure for the random effects. pdIdent, pdDiag and pdSym are already implemented. Default to pdIdent. pdIdent: $b_i \sim \mathcal{N}(0,\theta^2 I)$ (1 parameter), pdDiag: $b_i \sim \mathcal{N}(0,diag(\theta_1,\ldots,\theta_q))$ (q parameters), pdSym: $b_i \sim \mathcal{N}(0,\Psi)$ where $\Psi$ is symmetric positive definit ($q(q+1)/2$ parameters)

method

Only "ML" is allowed. "REML" is not yet implemented.

CovOpt

which optimization routine should be used for updating the variance components parameters. optimize or nlminb. nlminb uses the estimate of the last iteration as a starting value. nlminb is faster if there are many Gauss-Seidel iterations.

stopSat

logical. Should the algorithm stop when ntot > p?

standardize

Should the x matrix be standardized such that each column has mean 0 and standard deviation 1? Be careful if the x matrix includes dummy variables.

control

control parameters for the algorithm and the Armijo Rule, see lmmlassoControl for the details

ranInd

Index of the random effects with respect to the x matrix

...

not used.

Value

coefficients: estimated fixed-effects coefficients $\hat{\beta}$
pars: free parameters in the covariance matrix $\Psi$ of the random effects
sigma: standard deviation $\hat{\sigma}$ of the errors
random: vector with random effects, sorted by groups
u: vector with the standardized random effects, sorted by effect
ranef: vector with random effects, sorted by effect
fixef: estimated fixed-effects coeffidients $\hat{\beta}$
fitted.values: The fitted values $\hat{y} = \hat{X} \beta + Z \hat{b}_i$
residuals: raw residuals $y-\hat{y}$
Psi: Covariance matrix $\Psi$ of the random effects
corPsi: Correlation matrix of the random effects
logLik: value of the log-likelihood function
deviance: deviance=-2*logLik
npar: Number of parameters. Corresponds to the cardinality of the active set of coefficients plus the number of free parameters in Psi
aic: AIC
bic: BIC
data: data set, as a list with four components: x, y, z, grp (see above)
weights: weights for the fixed-effects covariates
nonpen: nonpenalized covariates. Differ from the input if weights is explicitely given
coefInit: list with three components used as starting values
lambda: positive regularization parameter
converged: Does the algorithm converge? 0: correct convergence ; an odd number means that maxIter was reached ; an even number means that the Armijo step was not succesful. For each unsuccessfull Armijo step, 2 is added to converged. If converged is large compared to the number of iterations counter, you may increase maxArmijo.
counter: The number of iterations used.
stopped: logical. Does the algorithm stopped due to ntot > p?
pdMat: Covariance structure for the random effects
method: "ML"
CovOpt: optimization routine
control: see lmmlassoControl
call: call
stopped: logical. Does the algorithm stopped due to a too large active set?
ranInd: Index of the random effects with respect to the x matrix
objective: Value of the objective function at the final estimates

Details

All the details of the algorithm can be found in Schelldorfer et. al. (2010).

References

J. Schelldorfer, P. B\"uhlmann and S. van de Geer (2011), Estimation for High-Dimensional Linear Mixed-Effects Models Using $\ell_1$-penalization, arXiv preprint 1002.3784v2

Examples

Run this code

# (1) Use the lmmlasso on the classroomStudy data set
data(classroomStudy)
fit1 <-
lmmlasso(x=classroomStudy$X,y=classroomStudy$y,z=classroomStudy$Z,
grp=classroomStudy$grp,lambda=15,pdMat="pdIdent")
summary(fit1)
plot(fit1)

# (2) Use the lmmlasso on a small simulated data set
set.seed(54)

N <- 20           # number of groups
p <- 6            # number of covariates (including intercept)
q <- 2            # number of random effect covariates
ni <- rep(6,N)    # observations per group
ntot <- sum(ni)   # total number of observations

grp <- factor(rep(1:N,each=ni)) # grouping variable

beta <- c(1,2,4,3,0,0,0) # fixed-effects coefficients
x <- cbind(1,matrix(rnorm(ntot*p),nrow=ntot)) # design matrix

bi1 <- rep(rnorm(N,0,3),each=ni) # Psi=diag(3,2)
bi2 <- rep(rnorm(N,0,2),each=ni)
bi <- rbind(bi1,bi2)
   
z <- x[,1:2,drop=FALSE]

y <- numeric(ntot)
for (k in 1:ntot) y[k] <- x[k,]%*%beta + t(z[k,])%*%bi[,grp[k]] + rnorm(1)

# correct random effects structure
fit2 <- lmmlasso(x=x,y=y,z=z,grp=grp,lambda=10,pdMat="pdDiag")
summary(fit2)
plot(fit2)

# wrong random effects structure
fit3 <- lmmlasso(x=x,y=y,z=x[,1:3],grp=grp,lambda=10,pdMat="pdDiag")
summary(fit3)
plot(fit3)

Run the code above in your browser using DataLab