glmm.lgst.batch: function to test genetic association between a dichotomous trait and a batch of genotyped SNPs in families using Generalized Linear Mixed Effects model

Description

Fit Generalized Linear Mixed Effects model (GLMM) with logistic link and a normal distributed random intercept for each cluster to test associations between a dichotomous phenotype and all genotyped SNPs in a genotype file in family data with user specified genetic model. Each pedigree is treated as a cluster. This function applies the same trait-SNP association test to all SNPs in the genotype data. When analyzing rare variants for dichotomous traits, this GLMM, as implemeted by this function, is recommended over other methods such as GEE. The trait-SNP association test is carried out by glmm.lgst function where the the lmer function from package lme4 is used.

Usage

glmm.lgst.batch(genfile, phenfile, pedfile, outfile, phen, covars = NULL, 
model = "a", col.names = T, sep.ped = ",", sep.phe = ",", sep.gen = ",")

Arguments

genfile

a character string naming the genotype file for reading (see format requirement in details)

phenfile

a character string naming the phenotype file for reading (see format requirement in details)

pedfile

a character string naming the pedigree file for reading (see format requirement in details)

outfile

a character string naming the result file for writing

phen

a character string for a phenotype name in phenfile

covars

a character vector for covariates in phenfile

model

a single character of 'a','d','g', or 'r', with 'a'=additive, 'd'=dominant, 'g'=general and 'r'=recessive models

col.names

a logical value indicating whether the output file should contain column names

sep.ped

the field separator character for pedigree file

sep.phe

the field separator character for phenotype file

sep.gen

the field separator character for genotype file

Value

phen: phenotype name
snp: SNP name
n0: the number of individuals with 0 copy of coded alleles
n1: the number of individuals with 1 copy of coded alleles
n2: the number of individuals with 2 copies of coded alleles
nd0: the number of individuals with 0 copy of coded alleles in affected sample
nd1: the number of individuals with 1 copy of coded alleles in affected sample
nd2: the number of individuals with 2 copies of coded alleles in affected sample
miss.0: Genotype missing rate in unaffected sample
miss.1: Genotype missing rate in affected sample
miss.diff.p: P-value of differential missingness test between unaffected and affected samples
beta: regression coefficient of SNP covariate
se: standard error of beta
chisq: Chi-square statistic for testing beta not equal to zero
df: degree of freedom of the Chi-square statistic
model: model actually used in the analysis
remark: warning or additional information for the analysis, 'exp count<5' 5="" indicates="" any="" expected="" count="" is="" less="" than="" in="" phenotype-genotype="" table;="" 'collinearity'="" collinearity="" exists="" between="" snp="" and="" some="" covariates<="" dd="">
pval: p-value of the chi-square statistic

beta10: regression coefficient of genotype with 1 copy of coded allele vs. that with 0 copy
beta20: regression coefficient of genotype with 2 copy of coded allele vs. that with 0 copy
beta21: regression coefficient of genotype with 2 copy of coded allele vs. that with 1 copy
se10: standard error of beta10
se20: standard error of beta20
se21: standard error of beta21

Details

The glmm.lgst.batch function first reads in and merges phenotype-covariates, genotype and pedigree files, then tests the association of phen against all SNPs in genfile. genfile contains unique individual id and genotype data, with the column names being "id" and SNP names. For each genotyped SNP, the genotype data should be coded as 0, 1, 2 indicating the numbers of the coded alleles. The SNP names in genotype file should not have any dash, '-' and other special characters(dots and underscores are OK). phenfile contains unique individual id, phenotype and covariates data, with the column names being "id" and phenotype and covaraite names. pedfile contains pedigree informaion, with the column names being "famid","id","fa","mo","sex". In all files, missing value should be an empty space, except missing parental id in pedfile. Only phenotypes with two categories are analyzed. A phenotype should be coded as 0 and 1, with 1 denoting affected and 0 unaffected. SNPs with low genotype counts (especially minor allele homozygote) may be omitted or analyzed with dominant model or analyzed with logistic regression. The glmm.lgst.batch function fits GLMM using each pedigree as a cluster with glmm.lgst function from GWAF package and lmer function from lme4 package.

Examples

Run this code

## Not run: 
# glmm.lgst.batch(phenfile="simphen.csv",genfile="simgen.csv",pedfile="simped.csv",
# phen="SIMQT",model="d",outfile="simout.csv",sep.ped=",",sep.phe=",",sep.gen=",")
# ## End(Not run)

Run the code above in your browser using DataLab