test.trioGxE: Test of gene-environment interaction between a SNP and a continuous non-genetic covariate from case-parent trio data.

Description

Performs permutation test of gene-environment interaction based on the associated penalized maximum likelihood estimates obtained by fitting a generalized additive model to case-parent trio data.

Usage

test.trioGxE(object, data = NULL, nreps, level = 0.05, early.stop = FALSE, 
             fix.sp = FALSE, output = NULL, return.data = FALSE, 
             return.object = FALSE, ...)

Arguments

object

A returned object from trioGxE function. When NULL, a data set of case-parent trios must be provided (through `data' argument).

data

Trio data to be passed into trioGxE when `object' is not provided.

nreps

Desired number of permutation replicates.

fix.sp

When TRUE, the approximated null distribution of the test statistic is obtained by computing the test statistic by fitting each simulated data set under fixed values of the smoothing parameters. When FALSE (Default), the null

level

Desired significance level for the test.

early.stop

When TRUE, sampling is terminated early when the number of test statistics that are more extreme than or as extreme as the observed test statistic equals nreps*level. $\code{nreps} \times \code{level}$ values larger than the

output

A character string specifying the name of the output file that writes the values of the test statistics calculated for the actual and simulated data set. When NULL (Default), no written output file is produced.

return.data

When TRUE, the original data set is returned.

return.object

When TRUE, the fitting object for the original data set is returned.

...

Arguments passed to trioGxE: when data is provided, instead of trioGxE class object, parameters of trioGxE must be provided throu

Value

GxE.testEither a 3- or 1-column matrix. When the actual data was fitted under a co-dominant penetrance mode (i.e., object$penmod = "codominant"), a 3-column matrix is returned, where the first column holds the values for $T$ for the original and the generated data sets, and the second and third columns hold the values of $T_1$ and $T_2$, respectively for the same data sets. When the actual data was fitted under a non-co-dominant penetrance mode (e.g., dominant), GxE.test is retuned as a matrix with a sigle column holding $T$.
p.valueIf object$penmod = "codominant", it is returned as a vector holding three values, where the first value indicates the overall p-value obtained from the distribution of $T$, and the other two values indicate the individual p-values obtained from the distributions of $T_1$ and $T_2$. Under object$penmod is dominant, additive or recessive, it is returned as a single p-value calculated based on $T$.

Details

Suppose $k_1$ and $k_2$ are the numbers of knots used to represent the interaction functions $f_1$ and $f_2$, respectively, via cubic regression spline functions. Let $\bm{c}_1 = (c_{11},...,c_{1K_1-1})^\prime$ and $\bm{c}_2 = (c_{21},...,c_{2K_2-1})^\prime$ are the spline coefficient vectors for $f_1$ and $f_2$ that satisfy model identifiability constraints.

The function test.trioGxE calculates test statistic $T$,$$T = t(\hat{\bm{c}}){\rm V}^{-1}(\bm{c})\hat{\bm{c}},$$ where $\bm{c}=(\bm{c}_1^{\prime},\bm{c}_2^{\prime})^{\prime}$ and $V_c$ is a square matrix of size $(k_1+k_2-2)$, formed by extracting the rows and columns, corresponding to the spline coefficients from the Bayesian posterior variance-covariance matrix calculated in trioGxE.

If the actual data were fitted under the co-dominant penetrance mode (i.e., object$penmod="codominant"), the test statistic $T$ represents an overall test of GxE, where $${\rm H}_0: \bm{c}=0.$$ Depending on the context, an investigator may also want to perform individual tests: ${\rm H}_{01}: \bm{c}_1 = \bm{0}$ and ${\rm H}_{02}: \bm{c}_2 = \bm{0}$. For example, when the null hypothesis is rejected, the user may want to know which of the two interaction function is not zero (i.e., which curve is not flat). For the individual tests, test.trioGxE calculates the permutation p-values based on the Monte-Carlo distributions of the individual test statistics $T_1$ and $T_2$, where $$T_h = t(\hat{\bm{c}}_h){\rm V}^{-1}(\bm{c}_h)\hat{\bm{c}}_h, h=1,2.$$

Under the dominant, log-additive (multiplicative) or recessive penetrance model, $T$ can be viewed as an individual test since $\bm{c}_2=\bm{0}$, $\bm{c}_1=\bm{c}_2$ and $\bm{c}_1=\bm{0}$, respectively, under the dominant, log-additive and recessive models. For example, under the dominant penetrance model, $T\equiv{T_1}$ because $\bm{c}_2=\bm{0}$, and $T_2=0$.

As the analysis is conditional on parental genotypes, the distribution of the test statistic under ${\rm H}_0$ is calculated by shuffling the column that holds the values of the non-genetic covariate within mating types. This can be justifiable based on the fact that under no interaction, the SNP and the non-genetic covariate are independent of each other within a random affected trio when they are independent within a trio from the general population (Umbach and Weinberg, 2000).

The distribution of the test statistics can be obtained in two ways: either under fixed smoothing parameters (fixed.sp=TRUE) or under varying smoothing parameters (fixed.sp=FALSE). Under the fixed smoothing parameters, the penalized iteratively re-weighted least squares procedure is performed for each simulated data set under the same smoothing parameter values. Under varying smoothing parameters, smoothing parameters are estimated for each simulated data set. Therefore, the test under fixed.sp=FALSE accounts for the extra uncertainty introduced by the smoothing parameter estimation.

To save computation time, the user can use `early-termination' option (Besag and Clifford, 1991). Under this option, sampling is terminated when the number of the simulated data sets reaches nreps*{level} $<$ nreps when the evidence is not strong enough to reject the null hypothesis at the given significance level (level). For example, if the user specifies nreps=1000 and level=0.05, the test terminates when the number of data sets that have test statistic values that are more extreme than or as extreme as the observed test statistic value reaches 50.

References

Umbach, D. and Weinberg, C. (2000). The use of case-parent triads to study joint effects of genotype and exposure. Am J Hum Gen, 66:251-61. Besag, J. and P. Clifford (1991). Sequential Monte Carlo p-values. Biometrika, 78:301-304.

Examples

Run this code

data(hypoTrioDat)
example.fit <- trioGxE(hypoTrioDat, pgenos = c("parent1","parent2"), cgeno = "child",
                       cenv = "attr",penmod="codominant", k=c(5,5))
# A toy example with 'few' permutation replicates
example.test <- test.trioGxE(example.fit, nreps=10, early.stop = FALSE, 
                              output=NULL)
 

## More proper examples of permutation tests with 5000 replicates

## Example1: does not generate an output file containing test statistic values
example.test1 <- test.trioGxE(example.fit, nreps=5000, early.stop = TRUE, 
                              output=NULL)
## Example 2: generates an output file 'myoutput.out' containing test statistic values 
example.test2 <- test.trioGxE(example.fit, nreps=5000, early.stop = TRUE, 
                              output="myoutput.out")

Run the code above in your browser using DataLab