GWAS: Run a genome-wide association study (GWAS) using the provided model

Description

maturing The GWAS function is used to run a genome-wide association study based on the specified model. This function is design to take the output from buildOneFac, buildOneFacRes, and buildTwoFac as input, but can also take a similar user specified model. Users should be confident that the models they are running are statistically identified. It is advisable that the users empirically gauge time requirements by running a limited number of SNPs (e.g. 10) to ensure that all SNPs can be fit in a reasonable amount of time.

Usage

GWAS(
  model,
  snpData,
  out = "out.log",
  ...,
  SNP = NULL,
  startFrom = 1L,
  rowFilter = NULL
)

Arguments

model

an MxModel model, specified using RAM or LISREL notation. The model argument is designed to take the output from e.g. buildOneFac (or the other prebuilt GW-SEM functions), but advanced users can specify their own arbitrary OpenMx Model or use Onyx to draw their path diagrams.

snpData

a pathway to a file containing GWAS data. The data can be in a variety of forms, such as standard PLINK format (bed/bim/fam), PLINK2 format (pgen/pvar/psam), Oxford format (bgen/sample), or CSV format (csv format in much slower due to the lack of compression for non-binary files).

out

a file name or pathway where the output from the analysis will be saved. The default pathway is "out.log", which will save the file in the working directory. Users should take caution when specifying the output file name so that the output from different analyses/chromosomes do not overwrite existing files.

...

Not used. Forces remaining arguments to be specified by name.

SNP

a numerical range that specifies the number of SNPs to be evaluated from the snpData file. This argument can be used to evaluate a subset of snps for model testing. e.g. 1:10 will run the first 10 snps to make sure that the model is functioning the way the users intends, that the files exist pathways are correct. This option is also very useful to specify a range of snps to be evaluated that is smaller than the complete file. For example, users may wish to run several discrete batches of analyses for chromosome 1, by running 1:10000, 100001:200000, etc. This prevents users from constructing numerous snap files for each chromosome. The default value of the SNP argument is NULL, which will run all snps in the file.

startFrom

a numerical value indicating which SNP is the first SNP to be analyzed. The function will then run every SNP from the specified SNP to the end of the GWAS data file. This is very useful if the analysis stops for some reason (i.e. the cluster is restarted for maintenance) and you can start from the last SNP that you analyzed. Note, you will want to label the output file (specified in out) with a new file name so that you don't overwrite the existing results.

rowFilter

optional named list of logical vectors to indicate which rows to skip when loading the SNP column

Value

The results for each SNP are recorded in the specified log file (out). In addition, data and estimates for the last SNP run are returned as an MxModel object (similar to the return value of mxRun). In this way, the last SNP processed is available for close inspection.

Details

Adds a compute plan returned by prepareComputePlan to the provided model and runs it. Once analyses are complete, load your aggregated results with loadResults.

Examples

Run this code

# NOT RUN {
dir <- system.file("extdata", package = "gwsem")
pheno <- data.frame(anxiety=rnorm(500))
m1 <- buildItem(pheno, 'anxiety')
GWAS(m1, file.path(dir,"example.bgen"),
     file.path(tempdir(),"out.log"))
# }

Run the code above in your browser using DataLab