dat: simulated data for demonstrating the usage of springer

Description

Simulated gene expression data for demonstrating the usage of springer.

Usage

data("dat")

Arguments

Format

The dat file consists of five components: e, g, y, clin and coeff. The coefficients are the true values of parameters used for generating Y.

Details

The data model for generating Y

Consider a longitudinal case study with $n$ subjects and $k_i$ measurements over time for the $i$th subject ($i=1,\ldots,n$). Let $Y_{ij}$ be the response of the $j$th observation for the $i$th subject ($i=1,\ldots,n$, $j=1,\ldots,k_i$), $X_{ij}=(X_{ij1},...,X_{ijp})^\top$ be a $p$-dimensional vector of covariates denoting $p$ genetic factors, $E_{ij}=(E_{ij1},...,E_{ijq})^\top$ be a $q$-dimensional environmental factor and $Clin_{ij}=(Clin_{ij1},...,Clin_{ijt})^\top$ be a $t$-dimensional clinical factor. There is time dependence among measurements on the same subject, but we assume that the measurements between different subjects are independent. The model we used for hierarchical variable selection for gene--environment interactions is given as:

$$Y_{ij}= \alpha_0 + \sum_{m=1}^{t}\theta_m Clin_{ijm} + \sum_{u=1}^{q}\alpha_u E_{iju} + \sum_{v=1}^{p}(\gamma_v X_{ijv} + \sum_{u=1}^{q}h_{uv} E_{iju} X_{ijv})+\epsilon_{ij},$$ where $\alpha_{0}$ is the intercept and the marginal density of $Y_{ij}$ belongs to a canonical exponential family defined in Liang and Zeger (1986). Define $\eta_v=(\gamma_v, h_{1v}, ..., h_{qv})^\top$, which is a vector of length q+1 and $Z_{ijv}=(X_{ijv}, E_{ij1}X_{ijv}, ..., E_{ijq}X_{ijv})^\top$, which contains the main genetic effect of the $v$th SNP from the $j$th measurement on the $i$th subject and its interactions with all the $q$ environmental factors. The model can be written as: $$Y_{ij}= \alpha_0 + \sum_{m=1}^{t}\theta_m Clin_{ijm} + \sum_{u=1}^{q}\alpha_u E_{iju} + \sum_{v=1}^{p}\eta_v^\top Z_{ijv}+\epsilon_{ij},$$ where $Z_{ijv}$ is the $v$th genetic factor and its interactions with the $q$ environment factors for the $j$th measurement on the $i$th subject, and $\eta_{v}$ is the corresponding coefficient vector of length $1+q$. The random error $\epsilon_{i}=(\epsilon_{i1},...,\epsilon_{ik_i})^{T}$, which is assumed to follow a multivariate normal distribution with $\Sigma_i$ as the covariance matrix for the repeated measurements of the $ith$ subject among the $k_i$ time points.

Description

Usage

Arguments

Format

Details

See Also