Run single variant or gene- or region-based score tests with SPA based on the linear/logistic mixed model.
SPAGMMATtest(
bgenFile = "",
bgenFileIndex = "",
vcfFile = "",
vcfFileIndex = "",
vcfField = "DS",
savFile = "",
savFileIndex = "",
sampleFile = "",
idstoExcludeFile = "",
idstoIncludeFile = "",
rangestoExcludeFile = "",
rangestoIncludeFile = "",
chrom = "",
start = 1,
end = 2.5e+08,
IsDropMissingDosages = FALSE,
minMAC = 0.5,
minMAF = 0,
maxMAFforGroupTest = 0.5,
minInfo = 0,
GMMATmodelFile = "",
varianceRatioFile = "",
SPAcutoff = 2,
SAIGEOutputFile = "",
numLinesOutput = 10000,
IsSparse = TRUE,
IsOutputAFinCaseCtrl = FALSE,
IsOutputNinCaseCtrl = FALSE,
LOCO = FALSE,
condition = "",
sparseSigmaFile = "",
groupFile = "",
kernel = "linear.weighted",
method = "optimal.adj",
weights.beta.rare = c(1, 25),
weights.beta.common = c(1, 25),
weightMAFcutoff = 0.01,
weightsIncludeinGroupFile = FALSE,
weights_for_G2_cond = NULL,
r.corr = 0,
IsSingleVarinGroupTest = TRUE,
cateVarRatioMinMACVecExclude = c(0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 10.5, 20.5),
cateVarRatioMaxMACVecInclude = c(1.5, 2.5, 3.5, 4.5, 5.5, 10.5, 20.5),
dosageZerodCutoff = 0.2,
IsOutputPvalueNAinGroupTestforBinary = FALSE,
IsAccountforCasecontrolImbalanceinGroupTest = TRUE,
IsOutputBETASEinBurdenTest = FALSE
)character. Path to bgen file. Currently version 1.2 with 8 bit compression is supported
character. Path to the .bgi file (index of the bgen file)
character. Path to vcf file
character. Path to index for vcf file by tabix, ".tbi" by "tabix -p vcf file.vcf.gz"
character. genotype field in vcf file to use. "DS" for dosages or "GT" for genotypes. By default, "DS".
character. Path to sav file
character. Path to index for sav file .s1r
character. Path to the file that contains one column for IDs of samples in the dosage, vcf, sav, or bgen file with NO header
character. Path to the file containing variant ids to be excluded from the bgen file. The file does not have a header and each line is for a marker ID.
character. Path to the file containing variant ids to be included from the bgen file. The file does not have a header and each line is for a marker ID.
character. Path to the file containing genome regions to be excluded from the bgen file. The file contains three columns for chromosome, start, and end respectively with no header
character. Path to the file containing genome regions to be included from the bgen file. The file contains three columns for chromosome, start, and end respectively with no header
character. string for the chromosome to include from vcf file. Required for vcf file. Note: the string needs to exactly match the chromosome string in the vcf/sav file. For example, "1" does not match "chr1". If LOCO is specified, providing chrom will save computation cost
numeric. start genome position to include from vcf file. By default, 1
numeric. end genome position to include from vcf file. By default, 250000000
logical. whether to drop missing dosages (TRUE) or to mean impute missing dosages (FALSE). By default, FALSE. This option only works for bgen, vcf, and sav input.
numeric. Minimum minor allele count of markers to test. By default, 0.5. The higher threshold between minMAC and minMAF will be used
numeric. Minimum minor allele frequency of markers to test. By default 0. The higher threshold between minMAC and minMAF will be used
numeric. Maximum minor allele frequency of markers to test in group test. By default 0.5.
numeric. Minimum imputation info of markers to test. By default, 0. This option only works for bgen, vcf, and sav input
character. Path to the input file containing the glmm model, which is output from previous step. Will be used by load()
character. Path to the input file containing the variance ratio, which is output from the previous step
by default = 2 (SPA test would be used when p value < 0.05 under the normal approximation)
character. Path to the output file containing assoc test results
numeric. Number of markers to be output each time. By default, 10000
logical. Whether to exploit the sparsity of the genotype vector for less frequent variants to speed up the SPA tests or not for dichotomous traits. By default, TRUE
logical. Whether to output allele frequency in cases and controls. By default, FALSE
logical. Whether to output sample sizes in cases and controls. By default, FALSE
logical. Whether to apply the leave-one-chromosome-out option. By default, FALSE
character. For conditional analysis. Genetic marker ids (chr:pos_ref/alt if sav/vcf dosage input , marker id if bgen input) seperated by comma. e.g.chr3:101651171_C/T,chr3:101651186_G/A, Note that currently conditional analysis is only for bgen,vcf,sav input.
character. Path to the file containing the sparseSigma from step 1. The suffix of this file is ".mtx".
character. Path to the file containing the group information for gene-based tests. Each line is for one gene/set of variants. The first element is for gene/set name. The rest of the line is for variant ids included in this gene/set. For vcf/sav, the genetic marker ids are in the format chr:pos_ref/alt. For bgen, the genetic marker ids should match the ids in the bgen file. Each element in the line is seperated by tab.
character. For gene-based test. By default, "linear.weighted". More options can be seen in the SKAT library
character. method for gene-based test p-values. By default, "optimal.adj". More options can be seen in the SKAT library
vector of numeric. parameters for the beta distribution to weight genetic markers with MAF <= weightMAFcutoff in gene-based tests.By default, "c(1,25)". More options can be seen in the SKAT library
vector of numeric. parameters for the beta distribution to weight genetic markers with MAF > weightMAFcutoff in gene-based tests.By default, "c(1,25)". More options can be seen in the SKAT library. NOTE: this argument is not fully developed. currently, weights.beta.common is euqal to weights.beta.rare
numeric. Between 0 and 0.5. See document above for weights.beta.rare and weights.beta.common. By default, 0.01
logical. Whether to specify customized weight for makers in gene- or region-based tests. If TRUE, weights are included in the group file. For vcf/sav, the genetic marker ids and weights are in the format chr:pos_ref/alt;weight. For bgen, the genetic marker ids should match the ids in the bgen filE, e.g. SNPID;weight. Each element in the line is seperated by tab. By default, FALSE
vector of float. weights for conditioning markers for gene- or region-based tests. The length equals to the number of conditioning markers, delimited by comma. By default, "c(1,2)"
numeric. bewteen 0 and 1. parameters for gene-based tests. By default, 0. More options can be seen in the SKAT library
logical. Whether to perform single-variant assoc tests for genetic markers included in the gene-based tests. By default, FALSE
vector of float. Lower bound of MAC for MAC categories. The length equals to the number of MAC categories for variance ratio estimation. By default, c(0.5,1.5,2.5,3.5,4.5,5.5,10.5,20.5). If groupFile="", only one variance ratio corresponding to MAC >= 20 is used
vector of float. Higher bound of MAC for MAC categories. The length equals to the number of MAC categories for variance ratio estimation minus 1. By default, c(1.5,2.5,3.5,4.5,5.5,10.5,20.5). If groupFile="", only one variance ratio corresponding to MAC >= 20 is used
numeric. In gene- or region-based tests, for each variants with MAC <= 10, dosages <= dosageZerodCutoff with be set to 0. By default, 0.2.
logical. In gene- or region-based tests for binary traits. if IsOutputPvalueNAinGroupTestforBinary is TRUE, p-values without accounting for case-control imbalance will be output. By default, FALSE
logical. In gene- or region-based tests for binary traits. If IsAccountforCasecontrolImbalanceinGroupTest is TRUE, p-values after accounting for case-control imbalance will be output. By default, TRUE
logical. Output effect size (BETA and SE) for burden tests. By default, FALSE
SAIGEOutputFile