lod_GWAS
enables the user to perform a Genome Wide
Association Analysis (GWAS) of a biomarker accommodating the
problem of Limit of Detection (LOD). This function performs a
parametric survival analysis on the phenotype of interest that
includes both measured and censored data.
lod_QC
is automatically called within lod_GWAS
,
and its quality report will be saved in a separate text file.
lod_GWAS(phenofile, pheno_name, basic_model = NULL, dist = "gaussian", mapfile, genofile, outputfile, filedirectory = getwd(), outputheader = "QCGWAS", gzip_output = TRUE, lower_limit = NA, upper_limit = NA)
basic_model="sex+age"
. Please note that covariate
names should exactly match the appropriate column names of
phenotype file. The default is NULL
, in which case the
association is modelled without covariates.
weibull
,
exponential
, gaussian
, logistic
,
lognormal
and loglogistic
. Default is
gaussian
. For more information, see the function
psm
of package rms
.
"QCGWAS"
, "GWAMA"
, "PLINK"
,
"META"
, and "GenABEL"
. Default is
"QCGWAS"
.
TRUE
.
lod_GWAS
returns an invisible NULL
. The real
output are the association results (saved as
[output_file].txt
) and the log file generated by
lod_QC
(saved as [output_file].txt.log
).
lod_GWAS
requires two files for the
genotypes and one phenotype file. The files can be either
space or tab delimited. The package also accepts files
compressed in the gzip format (extension .gz). Genotype Files lodGWAS
uses the PLINK dosage format for the genotype
data. This means that two files are needed: one with the
genotypes themselves (genotype dosage file), and one with the
locations of the genetic variants (map file). Genotype Dosage File The genotype dosage file should contain a header line. The
header line (first line) should be: SNP A1 A2 FID1 IID1 FID2 IID2 ... FIDn IIDn
The first three columns must appear before the dosage data. The
following columns are the family identifier (FID) and the
individual identifier (IID) of individuals 1 to n. Thus, the
number of columns of the header line should be exactly 3 +
(2 x n_individuals). The next lines contain the actual genetic data per individual,
with each row corresponding to a genetic variant. The PLINK
dosage format can be any of three formats: dosage,
two-probabilities, or three-probabilities (see below). lodGWAS
accepts all three formats and will automatically recognize
whether there are one (dosage), two (two-probabilities), or
three (three-probabilities) columns per individual. In case
of any other format it will report that it cannot recognize
the format and will not run. Dosage format A dosage is provided in one column per individual. Each dosage
is a number between 0 and 2. A dosage of 0, 1, or 2 means that
the individual is homozygous for the A2 allele, heterozygous,
or homozygous for the A1 allele, respectively. When the
genetic dataset is expanded using imputation, non-integer
values are also possible, and are defined as the weighted sum
of genotype probabilities (i.e. 0 x prob(A2/A2) + 1 x prob(A1/A2)
+ 2 x prob(A1/A1) ).
The number of columns of the (non-header) lines in a genotype
file in dosage format should be exactly 3 + n_individuals. Example of the dosage format: SNP A1 A2 FID1 IID1 FID2 IID2 FID3 IID3
rs0001 A C 0.08 0.72 1.99
Two-probabilities format Two numbers, representing the probabilities of the A1/A1 and
A1/A2 genotypes, respectively. The probability of A2/A2 equals
1 minus the sum of Prob(A1/A1) and Prob(A1/A2). Each
probability is a number between 0 and 1. The number of columns
of the (non-header) lines in a genotype file in two-probabilities
format should be exactly 3 + (2 x n_individuals). Example of the two-probabilities format: SNP A1 A2 FID1 IID1 FID2 IID2
rs0001 A C 0.97 0.02 0.88 0.10
Three-probabilities format Three numbers, representing the probabilities of the A1/A1,
A1/A2, and A2/A2 genotypes, respectively. Each probability is
a number between 0 and 1, and the three probabilities per
genetic variant per individual should add up to 1. The number
of columns of the (non-header) lines in a genotype file in
three-probabilities format should be exactly 3 +
(3 x n_individuals). Example of the three-probabilities format: SNP A1 A2 FID1 IID1 FID2 IID2
rs0001 A C 0.97 0.02 0.01 0.88 0.10 0.02
Genotype Map File The genotype map file contains the locations of the genetic
variants, with each row of the file corresponding to a variant.
It must contain four columns: lod_GWAS
,
so the actual value doesn't matter)
FID
, IID
, and
outsideLOD
, respectively (note that R is case sensitive).
The other columns (phenotype and cov1 to covN) can have any
arbitrary name.
outsideLOD
indicates
whether the phenotype value is within or beyond the range
of LOD. It must be to be coded as 0
if phenotype >
upper LOD; 1
if phenotype is within the detection
interval; and 2
if phenotype < lower LOD. Values
other than 0
, 1
, or 2
are not accepted.
NA
in both Phenotype
and
outsideLOD
columns. 3) Censored phenotype values are NDs, i.e. measurements
that fall beyond the LOD of the assay. NDs are not real
missing values, since they do provide information about the
distribution of the phenotype. Any ND that is below the lower
LOD should be changed to the value of the lower LOD (and the
corresponding outsideLOD value should set to 2
). Any ND
that is above the upper LOD should be changed to the value of
the upper LOD (and the corresponding outsideLOD value should
be set to 0
). NDs should NOT be coded as missing
(NA
). lodGWAS
can handle multiple lower and upper
LOD levels (e.g. as a result from different assays used to
measure the biomarker) in a single file. In that case the
phenotype of an ND should be changed to the lower/upper LOD
level of the assay type used for that individual. 4) The column phenotype can be either raw or transformed values
of the phenotype. Please take care that NDs (whose phenotype
value equals the LOD) must also be transformed appropriately.outputheader="QCGWAS"
) are as following:
# For use in this example, the 3 Sample files in the
# extdata folder of the lodGWAS library will be copied
# to your current R working directory
## Not run:
#
# file.copy(from = file.path(system.file("extdata", package = "lodGWAS"), "Sample_geno.dose"),
# to = getwd(), overwrite = FALSE, recursive = FALSE)
# file.copy(from = file.path(system.file("extdata", package = "lodGWAS"), "Sample_geno.map"),
# to = getwd(), overwrite = FALSE, recursive = FALSE)
# file.copy(from = file.path(system.file("extdata", package = "lodGWAS"), "Sample_pheno.txt"),
# to = getwd(), overwrite = FALSE, recursive = FALSE)
#
# lod_GWAS(phenofile = "Sample_pheno.txt", pheno_name = "outcome1",
# basic_model = "sex",
# mapfile = "Sample_geno.map", genofile = "Sample_geno.dose",
# outputfile = "Sample_output.txt", gzip_output = FALSE,
# lower_limit = 0.1, upper_limit = 2)
# ## End(Not run)
Run the code above in your browser using DataLab