SPAGMMATtest: Run single variant or gene- or region-based score tests with SPA based on the linear/logistic mixed model.

Description

Run single variant or gene- or region-based score tests with SPA based on the linear/logistic mixed model.

Usage

SPAGMMATtest(
  bgenFile = "",
  bgenFileIndex = "",
  vcfFile = "",
  vcfFileIndex = "",
  vcfField = "DS",
  savFile = "",
  savFileIndex = "",
  sampleFile = "",
  idstoExcludeFile = "",
  idstoIncludeFile = "",
  rangestoExcludeFile = "",
  rangestoIncludeFile = "",
  chrom = "",
  start = 1,
  end = 2.5e+08,
  IsDropMissingDosages = FALSE,
  minMAC = 0.5,
  minMAF = 0,
  maxMAFforGroupTest = 0.5,
  minInfo = 0,
  GMMATmodelFile = "",
  varianceRatioFile = "",
  SPAcutoff = 2,
  SAIGEOutputFile = "",
  numLinesOutput = 10000,
  IsSparse = TRUE,
  IsOutputAFinCaseCtrl = FALSE,
  IsOutputNinCaseCtrl = FALSE,
  LOCO = FALSE,
  condition = "",
  sparseSigmaFile = "",
  groupFile = "",
  kernel = "linear.weighted",
  method = "optimal.adj",
  weights.beta.rare = c(1, 25),
  weights.beta.common = c(1, 25),
  weightMAFcutoff = 0.01,
  weightsIncludeinGroupFile = FALSE,
  weights_for_G2_cond = NULL,
  r.corr = 0,
  IsSingleVarinGroupTest = TRUE,
  cateVarRatioMinMACVecExclude = c(0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 10.5, 20.5),
  cateVarRatioMaxMACVecInclude = c(1.5, 2.5, 3.5, 4.5, 5.5, 10.5, 20.5),
  dosageZerodCutoff = 0.2,
  IsOutputPvalueNAinGroupTestforBinary = FALSE,
  IsAccountforCasecontrolImbalanceinGroupTest = TRUE,
  IsOutputBETASEinBurdenTest = FALSE
)

Arguments

bgenFile

character. Path to bgen file. Currently version 1.2 with 8 bit compression is supported

bgenFileIndex

character. Path to the .bgi file (index of the bgen file)

vcfFile

character. Path to vcf file

vcfFileIndex

character. Path to index for vcf file by tabix, ".tbi" by "tabix -p vcf file.vcf.gz"

vcfField

character. genotype field in vcf file to use. "DS" for dosages or "GT" for genotypes. By default, "DS".

savFile

character. Path to sav file

savFileIndex

character. Path to index for sav file .s1r

sampleFile

character. Path to the file that contains one column for IDs of samples in the dosage, vcf, sav, or bgen file with NO header

idstoExcludeFile

character. Path to the file containing variant ids to be excluded from the bgen file. The file does not have a header and each line is for a marker ID.

idstoIncludeFile

character. Path to the file containing variant ids to be included from the bgen file. The file does not have a header and each line is for a marker ID.

rangestoExcludeFile

character. Path to the file containing genome regions to be excluded from the bgen file. The file contains three columns for chromosome, start, and end respectively with no header

rangestoIncludeFile

character. Path to the file containing genome regions to be included from the bgen file. The file contains three columns for chromosome, start, and end respectively with no header

chrom

character. string for the chromosome to include from vcf file. Required for vcf file. Note: the string needs to exactly match the chromosome string in the vcf/sav file. For example, "1" does not match "chr1". If LOCO is specified, providing chrom will save computation cost

start

numeric. start genome position to include from vcf file. By default, 1

end

numeric. end genome position to include from vcf file. By default, 250000000

IsDropMissingDosages

logical. whether to drop missing dosages (TRUE) or to mean impute missing dosages (FALSE). By default, FALSE. This option only works for bgen, vcf, and sav input.

minMAC

numeric. Minimum minor allele count of markers to test. By default, 0.5. The higher threshold between minMAC and minMAF will be used

minMAF

numeric. Minimum minor allele frequency of markers to test. By default 0. The higher threshold between minMAC and minMAF will be used

maxMAFforGroupTest

numeric. Maximum minor allele frequency of markers to test in group test. By default 0.5.

minInfo

numeric. Minimum imputation info of markers to test. By default, 0. This option only works for bgen, vcf, and sav input

GMMATmodelFile

character. Path to the input file containing the glmm model, which is output from previous step. Will be used by load()

varianceRatioFile

character. Path to the input file containing the variance ratio, which is output from the previous step

SPAcutoff

by default = 2 (SPA test would be used when p value < 0.05 under the normal approximation)

SAIGEOutputFile

character. Path to the output file containing assoc test results

numLinesOutput

numeric. Number of markers to be output each time. By default, 10000

IsSparse

logical. Whether to exploit the sparsity of the genotype vector for less frequent variants to speed up the SPA tests or not for dichotomous traits. By default, TRUE

IsOutputAFinCaseCtrl

logical. Whether to output allele frequency in cases and controls. By default, FALSE

IsOutputNinCaseCtrl

logical. Whether to output sample sizes in cases and controls. By default, FALSE

LOCO

logical. Whether to apply the leave-one-chromosome-out option. By default, FALSE

condition

character. For conditional analysis. Genetic marker ids (chr:pos_ref/alt if sav/vcf dosage input , marker id if bgen input) seperated by comma. e.g.chr3:101651171_C/T,chr3:101651186_G/A, Note that currently conditional analysis is only for bgen,vcf,sav input.

sparseSigmaFile

character. Path to the file containing the sparseSigma from step 1. The suffix of this file is ".mtx".

groupFile

character. Path to the file containing the group information for gene-based tests. Each line is for one gene/set of variants. The first element is for gene/set name. The rest of the line is for variant ids included in this gene/set. For vcf/sav, the genetic marker ids are in the format chr:pos_ref/alt. For bgen, the genetic marker ids should match the ids in the bgen file. Each element in the line is seperated by tab.

kernel

character. For gene-based test. By default, "linear.weighted". More options can be seen in the SKAT library

method

character. method for gene-based test p-values. By default, "optimal.adj". More options can be seen in the SKAT library

weights.beta.rare

vector of numeric. parameters for the beta distribution to weight genetic markers with MAF <= weightMAFcutoff in gene-based tests.By default, "c(1,25)". More options can be seen in the SKAT library

weights.beta.common

vector of numeric. parameters for the beta distribution to weight genetic markers with MAF > weightMAFcutoff in gene-based tests.By default, "c(1,25)". More options can be seen in the SKAT library. NOTE: this argument is not fully developed. currently, weights.beta.common is euqal to weights.beta.rare

weightMAFcutoff

numeric. Between 0 and 0.5. See document above for weights.beta.rare and weights.beta.common. By default, 0.01

weightsIncludeinGroupFile

logical. Whether to specify customized weight for makers in gene- or region-based tests. If TRUE, weights are included in the group file. For vcf/sav, the genetic marker ids and weights are in the format chr:pos_ref/alt;weight. For bgen, the genetic marker ids should match the ids in the bgen filE, e.g. SNPID;weight. Each element in the line is seperated by tab. By default, FALSE

weights_for_G2_cond

vector of float. weights for conditioning markers for gene- or region-based tests. The length equals to the number of conditioning markers, delimited by comma. By default, "c(1,2)"

r.corr

numeric. bewteen 0 and 1. parameters for gene-based tests. By default, 0. More options can be seen in the SKAT library

IsSingleVarinGroupTest

logical. Whether to perform single-variant assoc tests for genetic markers included in the gene-based tests. By default, FALSE

cateVarRatioMinMACVecExclude

vector of float. Lower bound of MAC for MAC categories. The length equals to the number of MAC categories for variance ratio estimation. By default, c(0.5,1.5,2.5,3.5,4.5,5.5,10.5,20.5). If groupFile="", only one variance ratio corresponding to MAC >= 20 is used

cateVarRatioMaxMACVecInclude

vector of float. Higher bound of MAC for MAC categories. The length equals to the number of MAC categories for variance ratio estimation minus 1. By default, c(1.5,2.5,3.5,4.5,5.5,10.5,20.5). If groupFile="", only one variance ratio corresponding to MAC >= 20 is used

dosageZerodCutoff

numeric. In gene- or region-based tests, for each variants with MAC <= 10, dosages <= dosageZerodCutoff with be set to 0. By default, 0.2.

IsOutputPvalueNAinGroupTestforBinary

logical. In gene- or region-based tests for binary traits. if IsOutputPvalueNAinGroupTestforBinary is TRUE, p-values without accounting for case-control imbalance will be output. By default, FALSE

IsAccountforCasecontrolImbalanceinGroupTest

logical. In gene- or region-based tests for binary traits. If IsAccountforCasecontrolImbalanceinGroupTest is TRUE, p-values after accounting for case-control imbalance will be output. By default, TRUE

IsOutputBETASEinBurdenTest

logical. Output effect size (BETA and SE) for burden tests. By default, FALSE

Value

SAIGEOutputFile