Probing immune system genetics via gene expression. VoCAL is a deconvolution-based method that utilizes transcriptome data to infer the quantities of immune-cell types, and then uses these quantitative traits to uncover the underlying DNA loci (iQTLs) assuming homozygosity (such as in the case of recombinent inbred strains).
vocal(...,reference_data,expression_data,genotyping_data,normalize_data,
T.i=5,T.e=10,eqtl_association_scores=NULL)
one or more data frames of one column, each one represents a preselected marker set that likely discriminate well between the immune-cell types given in the reference data. The number of data frames defines the number of association scores that would be combined to generate the final iQTL association score.
a data frame representing immune cell expression profiles.
Each row represents an expression of a gene, and each column represents a
different immune cell type. colnames
contains the name of each immune cell
type and the rownames
includes the genes' symbol. The names of each immune
cell type and the symbol of each gene should be unique. Any gene with
missing expression values must be excluded.
a data frame representing RNA-seq or microarray
gene-expression profiles of a given complex tissue across a population of
genetically distinct (genotyped) individuals. Each row represents an
expression of a gene, and each column represents a genetically distinct
individual. colnames
contain the name of each individual, as written in the
genotyping_data
, and rownames
includes the genes' symbol.
The name of each individual sample and the symbol of each gene should be unique.
Any gene with missing expression values should be excluded.
a data frame where each row represents a different
locus, and each column represents a genetically distinct individual.
The genotype should be taken from homozygous individuals only.
Where the genotype is unknown NA
should be used.
The first six columns contain the following information: (1) The sequential
identifier of the locus; (2) The name of each locus Chr; (3) Chromosome
position; (4) Start genome position; (5) End genome position;
(6) position in cM.
normalization type. The data will be normalized by either:
(1) "All" - subtraction of the mean expression of all strains;
(2) "None" - data is already normalized, do nothing;
(3) name of individual included in colnames
of expression_data
;
numerical. significant iQTL association score (-log10(Pvalue))
cutoff for the refinement step of the VoCAL algorithm.
numerical. significant eQTL association score (-log10(Pvalue))
cutoff for the refinement step of the VoCAL algorithm.
(optional) a data frame where each entry
represents an association score for a gene given the genotype of all the
individuals that appear in the expression_data data frame, in a specific locus.
This eQTL analysis should be peformed over the normalized expression_data.
colnames
contain the UID (as written in the genotyping_data) and
rownames
includes the genes' symbol (as written in the expression_data).
The symbol of each gene should be unique. These scores should be in -log10(P value).
Default is NULL, meaning that eQTL analysis will be performed.
a list of two martices
a matrix that contains the output iQTL association
score after applying the iterative filteration procedure. Each row represents the genome
wide-association result for a specific immune trait over a range of DNA loci.
rownames
provides the identifier of the locus and colnames
contains the
immune-cell type names. Each entry provides the -log10(P value)
of an iQTL
association score.
the names of all the markers removed from the different marker sets provided
Steuerman Y and Gat-Viks I. Exploiting Gene-Expression Deconvolution to Probe the Genetics of the Immune System (2015), Submitted.
# NOT RUN {
data(commons)
data(vocalEx)
# }
# NOT RUN {
results <- vocal(DCQ_mar, reference_data=immgen_dat, expression_data=lung_dat,
genotyping_data=gBXD, normalize_data="B6", eqtl_association_scores=eQTL_res)
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab