springer: fit the model with given tuning parameters

Description

This function performs penalized variable selection for longitudinal data based on generalized estimating equation (GEE) or quadratic inference functions (QIF) with a given value of lambda. Typical usage is to first obtain the optimal lambda using cross validation, then provide it to the springer function.

Usage

springer(
  clin = NULL,
  e,
  g,
  y,
  beta0,
  func,
  corr,
  structure,
  lam1,
  lam2,
  maxits = 30,
  tol = 0.001
)

Arguments

clin

a matrix of clinical covariates. The default value is NULL. Whether to include the clinical covariates is decided by user.

a matrix of environment factors.

a matrix of genetic factors.

the longitudinal response.

beta0

the initial coefficient vector

func

the framework to obtain the score equation. Two choices are available: "GEE" and "QIF".

corr

the working correlation structure adopted in the estimation algorithm. The springer provides three choices for the working correlation structure: exchangeable, AR-1,and independence.

structure

Three choices are available for structured variable selection. "bilevel" for sparse-group selection on both group-level and individual-level. "group" for selection on group-level only. "individual" for selection on individual-level only.

lam1

the tuning parameter $\lambda_1$ for individual-level penalty applied to genetic factors.

lam2

the tuning parameter $\lambda_2$ for group-level penalty applied to gene-environment interactions.

maxits

the maximum number of iterations that is used in the estimation algorithm. The default value is 30.

tol

The tolerance level. Coefficients with absolute values that are smaller than the tolerance level will be set to zero. The adhoc value can be chosen as 0.001.

Value

coef

the coefficient vector.

Details

Look back to the data model described in "dat": $$Y_{ij}= \alpha_0 + \sum_{m=1}^{t}\theta_m Clin_{ijm} + \sum_{u=1}^{q}\alpha_u E_{iju} + \sum_{v=1}^{p}\eta_v^\top Z_{ijv}+\epsilon_{ij},$$ where $Z_{ijv}$ contains the $v$th genetic main factor and its interactions with the $q$ environment factors for the $j$th measurement on the $i$th subject and $\eta_{v}$ is the corresponding coefficient vector of length $1+q$.

When structure="bilevel", variable selection for genetic main effects and gene-environment interactions under the longitudinal response will be conducted on both individual and group levels (bi-level selection):

Group-level selection: by determining whether $||\eta_{v}||_{2}=0$, we can know if the $v$th genetic variant has any effect at all.
Individual-level selection: investigate whether the $v$th genetic variant has main effect, G$\times$E interaction or both, by determining which components in $\eta_{v}$ has non-zero values.

If structure="group", only group-level selection will be conducted on $||\eta_{v}||_{2}$; if structure="individual", only individual-level selection will be conducted on each $\eta_{vu}$, ($u=1,\ldots,q$).

This function also provides choices for the framework that is used. If func="QIF", variable selection will be conducted within the quadratic inference functions framework; if func="GEE", variable selection will be conducted within the generalized estimating equation framework.

There are three options for the choice of the working correlation. If corr="exchangeable", the exchangeable working correlation will be applied; if corr="AR-1", the AR-1 working correlation will be adopted; if corr="independence", the independence working correlation will be used. Please check the references for more details.

Examples

Run this code

# NOT RUN {
data("dat")
##load the clinical covariates, environment factors, genetic factors and response from the
##"dat" file
clin=dat$clin
if(is.null(clin)){t=0} else{t=dim(clin)[2]}
e=dat$e
u=dim(e)[2]
g=dat$g
y=dat$y
##initial coefficient
beta0=dat$coef
##true nonzero coefficients
index=dat$index
beta = springer(clin=clin, e, g, y,beta0,func="GEE",corr="independence",structure="bilevel",
lam1=dat$lam1, lam2=dat$lam2,maxits=30,tol=0.01)
##only focus on the genetic main effects and gene-environment interactions
beta[1:(1+t+u)]=0
##effects that have nonzero coefficients
pos = which(beta != 0)
##true positive and false positive
tp = length(intersect(index, pos))
fp = length(pos) - tp
list(tp=tp, fp=fp)

# }

Run the code above in your browser using DataLab