glmm.score: Performing GLMM based score tests

Description

Use a glmmkin class object from the null GLMM to perform score tests for association with genotypes in a plink .bed file (binary genotypes), a GDS file .gds, or a plain text file (or compressed .gz or .bz2 file).

Usage

glmm.score(obj, infile, outfile, BGEN.samplefile = NULL, center = T, select = NULL, 
	MAF.range = c(1e-7, 0.5), miss.cutoff = 1, 
	missing.method = "impute2mean", nperbatch = 100, tol = 1e-5,
	infile.nrow = NULL, infile.nrow.skip = 0, infile.sep = "\t",
	infile.na = "NA", infile.ncol.skip = 1, infile.ncol.print = 1,
	infile.header.print = "SNP", is.dosage = FALSE, ncores = 1, verbose = FALSE)

Value

NULL if infile is a BGEN file (.bgen) or a GDS file (.gds), otherwise computational time in seconds, excluding I/O time.

Arguments

obj: a class glmmkin or class glmmkin.multi object, returned by fitting the null GLMM using glmmkin.
infile: the input file name or an object of class SeqVarGDSClass. Note that for plink binary genotype files only the prefix without .bed, .bim or .fam should be used. Only SNP major mode recognized in the binary file. Alternatively, it can be the full name of a BGEN file (including the suffix .bgen), a GDS file (including the suffix .gds), or a plain text file with some delimiters (comma, space, tab or something else), with one row for each SNP and one column for each individual. In that case, SNPs should be coded as numeric values (0/1/2 or dosages allowed, A/C/G/T coding is not recognized). There can be additional rows and columns to skip at the beginning. The order of individuals can be different from obj in the null GLMM (see the argument select). Some compressed files (.gz and .bz2) also allowed. If infile is an object of class SeqVarGDSClass, the .gds file will be closed upon successful completion of the function.
outfile: the output file name.
BGEN.samplefile: path to the BGEN sample file. Required when the BGEN file does not contain sample identifiers or the select parameter is NULL (default = NULL).
center: a logical switch for centering genotypes before tests. If TRUE, genotypes will be centered to have mean 0 before tests, otherwise raw values will be directly used in tests (default = TRUE).
select: an optional vector indicating the order of individuals in infile. If supplied, the length must match the number of individuals in infile (default = NULL). Individuals to be excluded should be coded 0. For example, select = c(2, 3, 1, 0) means the 1st individual in infile corresponds to the 2nd individual in obj, the 2nd individual in infile corresponds to the 3rd individual in obj, the 3rd individual in infile corresponds to the 1st individual in obj, the 4th individual in infile is not included in obj. If there are any duplicated id_include in obj (longitudinal data analysis), indices in select should match the order of individuals with unique id_include in obj. For plink binary genotype files and GDS files, this argument is not required and the sample ID's are automatically matched.
MAF.range: a numeric vector of length 2 defining the minimum and maximum minor allele frequencies of variants that should be included in the analysis (default = c(1e-7, 0.5)).
miss.cutoff: the maximum missing rate allowed for a variant to be included (default = 1, including all variants).
missing.method: method of handling missing genotypes. Either "impute2mean" or "omit" (default = "impute2mean").
nperbatch: an integer for how many SNPs should be tested in a batch (default = 100). The computational time can increase dramatically if this value is either small or large. The optimal value for best performance depends on the user's system.
tol: the threshold for determining monomorphism. If a SNP has value range less than the tolerance, it will be considered monomorphic and its association test p-value will be NA (default = 1e-5). Only used when infile is a plain text file (or compressed .gz or .bz2 file).
infile.nrow: number of rows to read in infile, including number of rows to skip at the beginning. If NULL, the program will determine how many rows there are in infile automatically and read all rows (default = NULL). Only used when infile is a plain text file (or compressed .gz or .bz2 file).
infile.nrow.skip: number of rows to skip at the beginning of infile. Must be nonnegative integers. Useful when header or comment lines are present (default = 0). Only used when infile is a plain text file (or compressed .gz or .bz2 file).
infile.sep: delimiter in infile (default = "\t"). Only used when infile is a plain text file (or compressed .gz or .bz2 file).
infile.na: symbol in infile to denote missing genotypes (default = "NA"). Only used when infile is a plain text file (or compressed .gz or .bz2 file).
infile.ncol.skip: number of columns to skip before genotype data in infile. These columns can be SNP name, alleles and/or quality measures and should be placed at the beginning in each line. After skipping these columns, the program will read in genotype data and perform score tests. Must be nonnegative integers. It is recommended that SNP name should be included as the first column in infile and genotype data should start from the second column or later (default = 1). Only used when infile is a plain text file (or compressed .gz or .bz2 file).
infile.ncol.print: a vector indicating which column(s) in infile should be printed to the output directly. These columns can be SNP name, alleles and/or quality measures placed at the beginning in each line. Must be nonnegative integers, no greater than infile.ncol.skip and sorted numerically in ascending order. By default, it is assumed that the first column is SNP name and genotype data start from the second column, and SNP name should be carried over to the output (default = 1). Should be set to NULL if infile.ncol.skip is 0. Only used when infile is a plain text file (or compressed .gz or .bz2 file).
infile.header.print: a character vector indicating column name(s) of column(s) selected to print by infile.ncol.print (default = "SNP"). Should be set to NULL if infile.ncol.skip is 0. Only used when infile is a plain text file (or compressed .gz or .bz2 file).
is.dosage: a logical switch for whether imputed dosage should be used from a GDS infile (default = FALSE).
ncores: a positive integer indicating the number of cores to be used in parallel computing (default = 1).
verbose: a logical switch for whether a progress bar should be shown for a GDS infile (default = FALSE).

Author

Han Chen, Duy T. Pham

References

Chen, H., Wang, C., Conomos, M.P., Stilp, A.M., Li, Z., Sofer, T., Szpiro, A.A., Chen, W., Brehm, J.M., Celedón, J.C., Redline, S., Papanicolaou, G.J., Thornton, T.A., Laurie, C.C., Rice, K. and Lin, X. (2016) Control forpopulation structure and relatedness for binary traits in genetic association studies via logistic mixed models. The American Journal of Human Genetics 98, 653-666.

Examples

Run this code

# \donttest{
data(example)
attach(example)
model0 <- glmmkin(disease ~ age + sex, data = pheno, kins = GRM, id = "id",
       family = binomial(link = "logit"))
plinkfiles <- strsplit(system.file("extdata", "geno.bed", package = "GMMAT"), 
       ".bed", fixed = TRUE)[[1]]
outfile.bed <- tempfile()
glmm.score(model0, infile = plinkfiles, outfile = outfile.bed)
if(requireNamespace("SeqArray", quietly = TRUE) && requireNamespace("SeqVarTools",
        quietly = TRUE)) {
	infile <- system.file("extdata", "geno.gds", package = "GMMAT")
	outfile.gds <- tempfile()
	glmm.score(model0, infile = infile, outfile = outfile.gds)
	unlink(outfile.gds)
}
infile <- system.file("extdata", "geno.txt", package = "GMMAT")
outfile.text <- tempfile()
glmm.score(model0, infile = infile, outfile = outfile.text, infile.nrow.skip = 5, 
	infile.ncol.skip = 3, infile.ncol.print = 1:3, 
	infile.header.print = c("SNP", "Allele1", "Allele2"))
infile <- system.file("extdata", "geno.bgen", package = "GMMAT")
samplefile <- system.file("extdata", "geno.sample", package = "GMMAT")
outfile.bgen <- tempfile()
glmm.score(model0, infile = infile, BGEN.samplefile = samplefile,
        outfile = outfile.bgen)
unlink(c(outfile.bed, outfile.text, outfile.bgen))
# }

Run the code above in your browser using DataLab