Usage
xSubneterSNPs(data, include.LD = NA, LD.customised = NULL, LD.r2 = 0.8,
significance.threshold = 5e-05, score.cap = 10, distance.max = 2e+05,
decay.kernel = c("slow", "linear", "rapid", "constant"),
decay.exponent = 2, GR.SNP = c("dbSNP_GWAS", "dbSNP_Common"),
GR.Gene = c("UCSC_knownGene", "UCSC_knownCanonical"),
scoring.scheme = c("max", "sum", "sequential"),
network = c("STRING_highest", "STRING_high", "STRING_medium",
"PCommonsUN_high", "PCommonsUN_medium", "PCommonsDN_high",
"PCommonsDN_medium", "PCommonsDN_Reactome", "PCommonsDN_KEGG",
"PCommonsDN_HumanCyc", "PCommonsDN_PID", "PCommonsDN_PANTHER",
"PCommonsDN_ReconX", "PCommonsDN_TRANSFAC", "PCommonsDN_PhosphoSite",
"PCommonsDN_CTD"), network.customised = NULL, seed.genes = T,
subnet.significance = 5e-05, subnet.size = NULL, verbose = T,
RData.location = "http://galahad.well.ox.ac.uk/bigdata")
Arguments
data
a named input vector containing the sinificance level for
nodes (dbSNP). For this named vector, the element names are dbSNP ID
(or in the format such as 'chr16:28525386'), the element values for the
significance level (measured as p-value or fdr). Alternatively, it can
be a matrix or data frame with two columns: 1st column for dbSNP, 2nd
column for the significance level
include.LD
additional SNPs in LD with Lead SNPs are also
included. By default, it is 'NA' to disable this option. Otherwise, LD
SNPs will be included based on one or more of 26 populations and 5
super populations from 1000 Genomics Project data (phase 3). The
population can be one of 5 super populations ("AFR", "AMR", "EAS",
"EUR", "SAS"), or one of 26 populations ("ACB", "ASW", "BEB", "CDX",
"CEU", "CHB", "CHS", "CLM", "ESN", "FIN", "GBR", "GIH", "GWD", "IBS",
"ITU", "JPT", "KHV", "LWK", "MSL", "MXL", "PEL", "PJL", "PUR", "STU",
"TSI", "YRI"). Explanations for population code can be found at
http://www.1000genomes.org/faq/which-populations-are-part-your-study LD.customised
a user-input matrix or data frame with 3 columns:
1st column for Lead SNPs, 2nd column for LD SNPs, and 3rd for LD r2
value. It is designed to allow the user analysing their precalcuated LD
info. This customisation (if provided) has the high priority over
built-in LD SNPs
LD.r2
the LD r2 value. By default, it is 0.8, meaning that SNPs
in LD (r2>=0.8) with input SNPs will be considered as LD SNPs. It can
be any value from 0.8 to 1
significance.threshold
the given significance threshold. By
default, it is set to NULL, meaning there is no constraint on the
significance level when transforming the significance level of SNPs
into scores. If given, those SNPs below this are considered significant
and thus scored positively. Instead, those above this are considered
insigificant and thus receive no score
score.cap
the maximum score being capped. By default, it is set
to 10. If NULL, no capping is applied
distance.max
the maximum distance between genes and SNPs. Only
those genes no far way from this distance will be considered as seed
genes. This parameter will influence the distance-component weights
calculated for nearby SNPs per gene
decay.kernel
a character specifying a decay kernel function. It
can be one of 'slow' for slow decay, 'linear' for linear decay, and
'rapid' for rapid decay. If no distance weight is used, please select
'constant'
decay.exponent
an integer specifying a decay exponent. By
default, it sets to 2
GR.SNP
the genomic regions of SNPs. By default, it is
'dbSNP_GWAS', that is, SNPs from dbSNP (version 146) restricted to GWAS
SNPs and their LD SNPs (hg19). It can be 'dbSNP_Common', that is,
Common SNPs from dbSNP (version 146) plus GWAS SNPs and their LD SNPs
(hg19). Alternatively, the user can specify the customised input. To do
so, first save your RData file (containing an GR object) into your
local computer, and make sure the GR object content names refer to
dbSNP IDs. Then, tell "GR.SNP" with your RData file name (with or
without extension), plus specify your file RData path in
"RData.location". Note: you can also load your customised GR object
directly
GR.Gene
the genomic regions of genes. By default, it is
'UCSC_knownGene', that is, UCSC known genes (together with genomic
locations) based on human genome assembly hg19. It can be
'UCSC_knownCanonical', that is, UCSC known canonical genes (together
with genomic locations) based on human genome assembly hg19.
Alternatively, the user can specify the customised input. To do so,
first save your RData file (containing an GR object) into your local
computer, and make sure the GR object content names refer to Gene
Symbols. Then, tell "GR.Gene" with your RData file name (with or
without extension), plus specify your file RData path in
"RData.location". Note: you can also load your customised GR object
directly
scoring.scheme
the method used to calculate seed gene scores
under a set of SNPs. It can be one of "sum" for adding up, "max" for
the maximum, and "sequential" for the sequential weighting. The
sequential weighting is done via: $\sum_{i=1}{\frac{R_{i}}{i}}$,
where $R_{i}$ is the $i^{th}$ rank (in a descreasing order)
network
the built-in network. Currently two sources of network
information are supported: the STRING database (version 10) and the
Pathways Commons database (version 7). STRING is a meta-integration of
undirect interactions from the functional aspect, while Pathways
Commons mainly contains both undirect and direct interactions from the
physical/pathway aspect. Both have scores to control the confidence of
interactions. Therefore, the user can choose the different quality of
the interactions. In STRING, "STRING_highest" indicates interactions
with highest confidence (confidence scores>=900), "STRING_high" for
interactions with high confidence (confidence scores>=700), and
"STRING_medium" for interactions with medium confidence (confidence
scores>=400). For undirect/physical interactions from Pathways Commons,
"PCommonsUN_high" indicates undirect interactions with high confidence
(supported with the PubMed references plus at least 2 different
sources), "PCommonsUN_medium" for undirect interactions with medium
confidence (supported with the PubMed references). For direct
(pathway-merged) interactions from Pathways Commons, "PCommonsDN_high"
indicates direct interactions with high confidence (supported with the
PubMed references plus at least 2 different sources), and
"PCommonsUN_medium" for direct interactions with medium confidence
(supported with the PubMed references). In addtion to pooled version of
pathways from all data sources, the user can also choose the
pathway-merged network from individual sources, that is,
"PCommonsDN_Reactome" for those from Reactome, "PCommonsDN_KEGG" for
those from KEGG, "PCommonsDN_HumanCyc" for those from HumanCyc,
"PCommonsDN_PID" for those froom PID, "PCommonsDN_PANTHER" for those
from PANTHER, "PCommonsDN_ReconX" for those from ReconX,
"PCommonsDN_TRANSFAC" for those from TRANSFAC, "PCommonsDN_PhosphoSite"
for those from PhosphoSite, and "PCommonsDN_CTD" for those from CTD
network.customised
an object of class "igraph". By default, it
is NULL. It is designed to allow the user analysing their customised
network data that are not listed in the above argument 'network'. This
customisation (if provided) has the high priority over built-in
network
seed.genes
logical to indicate whether the identified network is
restricted to seed genes (ie nearby genes that are located within
defined distance window centred on lead or LD SNPs). By default, it
sets to true
subnet.significance
the given significance threshold. By
default, it is set to NULL, meaning there is no constraint on
nodes/genes. If given, those nodes/genes with p-values below this are
considered significant and thus scored positively. Instead, those
p-values above this given significance threshold are considered
insigificant and thus scored negatively
subnet.size
the desired number of nodes constrained to the
resulting subnet. It is not nulll, a wide range of significance
thresholds will be scanned to find the optimal significance threshold
leading to the desired number of nodes in the resulting subnet.
Notably, the given significance threshold will be overwritten by this
option
verbose
logical to indicate whether the messages will be
displayed in the screen. By default, it sets to true for display
RData.location
the characters to tell the location of built-in
RData files. See xRDataLoader
for details