inferGenotypeBayesian
infers an subject's genotype by applying a Bayesian framework
with a Dirichlet prior for the multinomial distribution. Up to four distinct alleles are
allowed in an individual’s genotype. Four likelihood distributions were generated by
empirically fitting three high coverage genotypes from three individuals
(Laserson and Vigneault et al, 2014). A posterior probability is calculated for the
four most common alleles. The certainty of the highest probability model was
calculated using a Bayes factor (the most likely model divided by second-most likely model).
The larger the Bayes factor (K), the greater the certainty in the model.
inferGenotypeBayesian(
data,
germline_db = NA,
novel = NA,
v_call = "v_call",
seq = "sequence_alignment",
find_unmutated = TRUE,
priors = c(0.6, 0.4, 0.4, 0.35, 0.25, 0.25, 0.25, 0.25, 0.25)
)
A data.frame
of alleles denoting the genotype of the subject with the log10
of the likelihood of each model and the log10 of the Bayes factor. The output
contains the following columns:
gene
: The gene name without allele.
alleles
: Comma separated list of alleles for the given gene
.
counts
: Comma separated list of observed sequences for each
corresponding allele in the alleles
list.
total
: The total count of observed sequences for the given gene
.
note
: Any comments on the inferrence.
kh
: log10 likelihood that the gene
is homozygous.
kd
: log10 likelihood that the gene
is heterozygous.
kt
: log10 likelihood that the gene
is trizygous
kq
: log10 likelihood that the gene
is quadrozygous.
k_diff
: log10 ratio of the highest to second-highest zygosity likelihoods.
a data.frame
containing V allele
calls from a single subject. If find_unmutated
is TRUE
, then the sample IMGT-gapped V(D)J sequence
should be provided in column sequence_alignment
named vector of sequences containing the
germline sequences named in allele_calls
.
Only required if find_unmutated
is TRUE
.
an optional data.frame
of the type
novel returned by findNovelAlleles containing
germline sequences that will be utilized if
find_unmutated
is TRUE
. See Details.
column in data
with V allele calls.
Default is "v_call"
.
name of the column in data
with the
aligned, IMGT-numbered, V(D)J nucleotide sequence.
Default is "sequence_alignment"
.
if TRUE
, use germline_db
to
find which samples are unmutated. Not needed
if allele_calls
only represent
unmutated samples.
a numeric vector of priors for the multinomial distribution.
The priors
vector must be nine values that defined
the priors for the heterozygous (two allele),
trizygous (three allele), and quadrozygous (four allele)
distributions. The first two values of priors
define
the prior for the heterozygous case, the next three values are for
the trizygous case, and the final four values are for the
quadrozygous case. Each set of priors should sum to one.
Note, each distribution prior is actually defined internally
by set of four numbers, with the unspecified final values
assigned to 0
; e.g., the heterozygous case is
c(priors[1], priors[2], 0, 0)
. The prior for the
homozygous distribution is fixed at c(1, 0, 0, 0)
.
Allele calls representing cases where multiple alleles have been
assigned to a single sample sequence are rare among unmutated
sequences but may result if nucleotides for certain positions are
not available. Calls containing multiple alleles are treated as
belonging to all groups. If novel
is provided, all
sequences that are assigned to the same starting allele as any
novel germline allele will have the novel germline allele appended
to their assignent prior to searching for unmutated sequences.
Laserson U and Vigneault F, et al. High-resolution antibody dynamics of vaccine-induced immune responses. PNAS. 2014 111(13):4928-33.
plotGenotype for a colorful visualization and genotypeFasta to convert the genotype to nucleotide sequences. See inferGenotype to infer a subject-specific genotype using a frequency method
# Infer IGHV genotype, using only unmutated sequences, including novel alleles
inferGenotypeBayesian(AIRRDb, germline_db=SampleGermlineIGHV, novel=SampleNovel,
find_unmutated=TRUE, v_call="v_call", seq="sequence_alignment")
Run the code above in your browser using DataLab