Learn R Programming

trio (version 3.10.0)

colGxE: Genotypic TDT for Gene-Environment Interactions

Description

Performs a genotypic TDT for gene-environment interactions for each SNP represented by a column of a matrix in genotype format and a binary environmental factor. If alpha1 is set to a value smaller than 1, then the two-step procedure of Gauderman et al. (2010) will be used to first select all SNPs showing a p-value smaller than alpha1 in a logistic regression of the environmental factor against the sums of the codings for the parents' genotypes at the respective SNP. In the second step, the genotypic TDT is then applied to the selected SNPs.

If unstructured = TRUE, all fully parameterized model is considered and a likelihood ratio test is performed.

While colGxE computes the p-values based on asymptotic ChiSquare-distributions, colGxEPerms can be used to determine permutation-based p-values for the basic genotypic TDT (i.e. for colGxE using alpha = 1 and unstructured = FALSE.

Usage

colGxE(mat.snp, env, model = c("additive", "dominant", "recessive"), alpha1 = 1, size = 50, addGandE = TRUE, whichLRT = c("both", "2df", "1df", "none"), add2df = TRUE, addCov = FALSE, famid = NULL, unstructured = FALSE) colGxEPerms(mat.snp, env, model = c("additive", "dominant", "recessive"), B = 10000, size = 20, addPerms = TRUE, famid = NULL, rand = NA)

Arguments

mat.snp
a numeric matrix in which each column represents a SNP. Each column must be a numeric vector of length $3 * t$ representing a SNP genotyped at $t$ trios. Each of the $t$ blocks must consist of the genotypes of father, mother, and offspring (in this order). The genotypes must be coded by 0, 1, and 2. Missing values are allowed and need to be coded by NA. This matrix might be generated from a ped-file by, e.g., employing ped2geno.
env
a vector of length $t$ (see mat.snp) containing for each offspring the value of a binary environmental variable, which must take the values 0 and 1.
model
type of model that should be fitted. Abbreviations are allowed. Thus, e.g., model = "dom" will fit a dominant model, and model = "r" an recessive model.
alpha1
a numeric value between 0 and 1 (excluding 0). If alpha1 = 1, all SNPs will be tested with a genotypic TDT. Otherwise, the two-step procedure of Gauderman et al. (2010) will be used to select all SNPs showing a p-value smaller than or equal to alpha1 in a logistic regression in which the environmental factor is used as response and the sums over the codings for the genotypes of the parents are employed as predictor. The genotypic TDT will then be applied to the selected SNPs. Since a logistic regression is employed in the first step, which requires a numerical determination of the parameter estimates, the two-step procedure will not lead to a reduction in computing time, but will increase the computing time.
size
the number of SNPs considered simultaneously when computing the parameter estimates.
addGandE
should the relative risks and their confidence intervals for the exposed cases be added to the output?
whichLRT
character string specifying which likelihood ratio test should be added to the output. If "2df", 2 degree of freedom likelihood ratio tests comparing the fitted models (containing one parameter for the SNP and one for the gene-environment interaction) with models containing no factor will be performed. If "1df", one degree of freedom likelihood ratio tests comparing the fitted model (containing two parameters, one for the SNP and the other for the interaction) with models only containing the respective SNP will be added to the output. If "both" (default), both tests will be performed, whereas none test will be done, if whichLRT = "none".
add2df
should the results of a 2 df Wald test for testing both the SNP and the interaction effect simultaneously be added to the model?
addCov
should the covariance between the parameter estimations for the SNP and the gene-environment interaction be added to the output? Default is addCov = FALSE, as this covariance is given by the negative variance of the parameter estimate for the SNP.
famid
a vector of the same length as env specifying the family IDs for the corresponding values of the environmental variable in env. Can be used to reorder the vector env when the order of the trios differs between env and mat.snp.
unstructured
should a fully parameterized model be fitted? If TRUE, a 2 df likelihood ratio test is performed comparing a gTDT model containing one indicator variable for the heterozygous genotype and one for the homozygous variant genotype with a gTDT model additionally containing two terms for the interactions between these variables and the environmental factor. In this case, only the arguments mat.snp, env, and famid are considered.
B
number of permutations.
addPerms
should the matrices containing the permuted values of the test statistics for the SNP and the gene-environment interaction be added to the output?
rand
integer for setting the random number generator into a reproducible state.

Value

For colGxE with unstructured=FALSE, an object of class colGxE consisting of the following numeric matrices with two columns (one for each parameter):
coef
the estimated parameter,
se
the estimated standard deviation of the parameter estimate,
stat
Wald statistic,
RR
the relative risk, i.e.\ in the case of trio data, exp(coef) (see Schaid, 1996),
lowerRR
the lower bound of the 95% confidence interval for RR,
upperRR
the upper bound of the 95% confidence interval for RR,
usedTrios
the number of trios affecting the parameter estimation,
env
vector containing the values of the environmental factor,
type
model,
addGandE
the value of addGandE,
addOther
a logical vector specifying which of the likelihood ratio tests and if the 2 df Wald test was performed,
and depending on the specifications in colGxE
cov
numeric vector containing the covariances,
lrt2df
a numeric matrix with two columns, in which the first column contains the values of the 1 df likelihood ratio test statistic and the second the corresponding p-values,
wald2df
a numeric matrix with two columns, in which the first column contains the values of the 2 df Wald test statistics and the second the corresponding p-values,
lrt1df
a numeric matrix with two columns, in which the first column contains the values of the 2 df likelihood ratio test statistic and the seocnd the corresponding p-values.
For colGxE with unstructured=TRUE, an object of class colGxEunstruct consisting of the following vectors:
ll.main
the loglikelihoods of the models containing only the two main effects,
ll.full
the loglikelihoods of the models additionally containing the two main effects and the two interaction effects,
stat
the values of the test statistic of the likelihood ratio test,
pval
the corresponding p-values.
For colGxEPerms,
stat
a matrix with two columns containing the values of gTDT statistics for the main effects of the SNPs and the gene-environment interactions when considering the original, unpermuted case-pseudo-control status,
pval
a matrix with two columns comprising the permutation-based p-values corresponding to the test statistics in stat,
and if addPerms = TRUE
matPermG
a matrix with B columns containing the values of the gTDT statistic for the SNPs when considering the B permutations of the case-pseudo-control status,
matPermGxE
a matrix with B columns containing the values of the gTDT statistic for the gene-environment interactions when considering the B permutations of the case-pseudo-control status.

Details

A conditional logistic regression model including two parameters, one for $G$, and the other for $GxE$, is fitted, where $G$ is specified according to model.

References

Gauderman, W.J., Thomas, D.C., Murcray, C.E., Conti, D., Li, D., and Lewinger, J.P. (2010). Efficient Genome-Wide Association Testing of Gene-Environment Interaction in Case-Parent Trios. American Journal of Epidemiology, 172, 116-122.

Schaid, D.J. (1996). General Score Tests for Associations of Genetic Markers with Disease Using Cases and Their Parents. Genetic Epidemiology, 13, 423-449.

Schwender, H., Taub, M.A., Beaty, T.H., Marazita, M.L., and Ruczinski, I. (2011). Rapid Testing of SNPs and Gene-Environment Interactions in Case-Parent Trio Data Based on Exact Analytic Parameter Estimation. Biometrics, 68, 766-773.

See Also

colTDT, ped2geno

Examples

Run this code
# Load the simulated data for the analysis.
data(trio.data)

# Set up a vector with the binary environmental variable.
# Here, we consider the gene-gender interactions and
# assume that the children in the first 50 trios are
# girls, and the remaining 50 are boys.
sex <- rep(0:1, each = 50)

# Test the interaction of sex with each of the SNPs in mat.test
gxe.out <- colGxE(mat.test, sex)

# By default, an additive mode of inheritance is considered.
# If, e.g., a dominant mode should be considered, then this can
# be done by calling
gxeDom.out <- colGxE(mat.test, sex, model="dominant")

Run the code above in your browser using DataLab