Usage
xSNP2GeneScores(data, include.LD = NA, LD.customised = NULL, LD.r2 =
0.8,
significance.threshold = 5e-05, score.cap = 10, distance.max = 50000,
decay.kernel = c("slow", "linear", "rapid", "constant"),
decay.exponent = 2, GR.SNP = c("dbSNP_GWAS", "dbSNP_Common"),
GR.Gene = c("UCSC_knownGene", "UCSC_knownCanonical"),
scoring.scheme = c("max", "sum", "sequential"), verbose = T,
RData.location = "http://galahad.well.ox.ac.uk/bigdata")
Arguments
data
a named input vector containing the sinificance level for
nodes (dbSNP). For this named vector, the element names are dbSNP ID
(or in the format such as 'chr16:28525386'), the element values for the
significance level (measured as p-value or fdr). Alternatively, it can
be a matrix or data frame with two columns: 1st column for dbSNP, 2nd
column for the significance level
include.LD
additional SNPs in LD with Lead SNPs are also
included. By default, it is 'NA' to disable this option. Otherwise, LD
SNPs will be included based on one or more of 26 populations and 5
super populations from 1000 Genomics Project data (phase 3). The
population can be one of 5 super populations ("AFR", "AMR", "EAS",
"EUR", "SAS"), or one of 26 populations ("ACB", "ASW", "BEB", "CDX",
"CEU", "CHB", "CHS", "CLM", "ESN", "FIN", "GBR", "GIH", "GWD", "IBS",
"ITU", "JPT", "KHV", "LWK", "MSL", "MXL", "PEL", "PJL", "PUR", "STU",
"TSI", "YRI"). Explanations for population code can be found at
http://www.1000genomes.org/faq/which-populations-are-part-your-study LD.customised
a user-input matrix or data frame with 3 columns:
1st column for Lead SNPs, 2nd column for LD SNPs, and 3rd for LD r2
value. It is designed to allow the user analysing their precalcuated LD
info. This customisation (if provided) has the high priority over
built-in LD SNPs
LD.r2
the LD r2 value. By default, it is 0.8, meaning that SNPs
in LD (r2>=0.8) with input SNPs will be considered as LD SNPs. It can
be any value from 0.8 to 1
significance.threshold
the given significance threshold. By
default, it is set to NULL, meaning there is no constraint on the
significance level when transforming the significance level of SNPs
into scores. If given, those SNPs below this are considered significant
and thus scored positively. Instead, those above this are considered
insigificant and thus receive no score
score.cap
the maximum score being capped. By default, it is set
to 10. If NULL, no capping is applied
distance.max
the maximum distance between genes and SNPs. Only
those genes no far way from this distance will be considered as seed
genes. This parameter will influence the distance-component weights
calculated for nearby SNPs per gene
decay.kernel
a character specifying a decay kernel function. It
can be one of 'slow' for slow decay, 'linear' for linear decay, and
'rapid' for rapid decay. If no distance weight is used, please select
'constant'
decay.exponent
a numeric specifying a decay exponent. By
default, it sets to 2
GR.SNP
the genomic regions of SNPs. By default, it is
'dbSNP_GWAS', that is, SNPs from dbSNP (version 146) restricted to GWAS
SNPs and their LD SNPs (hg19). It can be 'dbSNP_Common', that is,
Common SNPs from dbSNP (version 146) plus GWAS SNPs and their LD SNPs
(hg19). Alternatively, the user can specify the customised input. To do
so, first save your RData file (containing an GR object) into your
local computer, and make sure the GR object content names refer to
dbSNP IDs. Then, tell "GR.SNP" with your RData file name (with or
without extension), plus specify your file RData path in
"RData.location". Note: you can also load your customised GR object
directly
GR.Gene
the genomic regions of genes. By default, it is
'UCSC_knownGene', that is, UCSC known genes (together with genomic
locations) based on human genome assembly hg19. It can be
'UCSC_knownCanonical', that is, UCSC known canonical genes (together
with genomic locations) based on human genome assembly hg19.
Alternatively, the user can specify the customised input. To do so,
first save your RData file (containing an GR object) into your local
computer, and make sure the GR object content names refer to Gene
Symbols. Then, tell "GR.Gene" with your RData file name (with or
without extension), plus specify your file RData path in
"RData.location". Note: you can also load your customised GR object
directly
scoring.scheme
the method used to calculate seed gene scores
under a set of SNPs. It can be one of "sum" for adding up, "max" for
the maximum, and "sequential" for the sequential weighting. The
sequential weighting is done via: $\sum_{i=1}{\frac{R_{i}}{i}}$,
where $R_{i}$ is the $i^{th}$ rank (in a descreasing order)
verbose
logical to indicate whether the messages will be
displayed in the screen. By default, it sets to true for display
RData.location
the characters to tell the location of built-in
RData files. See xRDataLoader
for details