The createFullHaplotype
functions infers haplotype based on an anchor gene.
createFullHaplotype(
clip_db,
toHap_col = c("v_call", "d_call"),
hapBy_col = "j_call",
hapBy = "IGHJ6",
toHap_GERM = NULL,
relative_freq_priors = TRUE,
kThreshDel = 3,
rmPseudo = TRUE,
deleted_genes = c(),
nonReliable_Vgenes = c(),
min_minor_fraction = 0.3,
single_gene = TRUE,
chain = c("IGH", "IGK", "IGL", "TRB")
)
A data.frame
, in which each row is the haplotype inference summary of a gene from the column selected in toHap_col
.
The output containes the following columns:
subject
: the subject name.
gene
: the gene name.
Anchor gene allele 1: the haplotype inference for chromosome one. The column name is the anchor gene with the first allele.
Anchor gene allele 2: the haplotype inference for chromosome two. The column name is the anchor gene with the second allele.
alleles
: allele calls for the gene.
proirs_row
: priors based on relative allele usage of the anchor gene.
proirs_col
: priors based on relative allele usage of the inferred gene.
counts1
: the appereance count on each chromosome of the first allele from alleles
, the counts are seperated by a comma.
k1
: the Bayesian factor value for the first allele (from alleles
) inference.
counts2
: the appereance count on each chromosome of the second allele from alleles
, the counts are seperated by a comma.
k2
: the Bayesian factor value for the second allele (from alleles
) inference.
counts3
: the appereance count on each chromosome of the third allele from alleles
, the counts are seperated by a comma.
k3
: the Bayesian factor value for the third allele (from alleles
) inference.
counts4
: the appereance count on each chromosome of the fourth allele from alleles
, the counts are seperated by a comma.
k4
: the Bayesian factor value for the fourth allele (from alleles
) inference.
a data.frame
in AIRR format. See details.
a vector of column names for which a haplotype should be inferred. Default is v_call and d_call
column name of the anchor gene. Default is j_call
a string of the anchor gene name. Default is IGHJ6.
a vector of named nucleotide germline sequences matching the allele calls in toHap_col
columns in clip_db.
if TRUE, the priors for Bayesian inference are estimated from the relative frequencies in clip_db. Else, priors are set to c(0.5,0.5)
. Default is TRUE
the minimum lK (log10 of the Bayes factor) to call a deletion. Default is 3.
if TRUE non-functional and pseudo genes are removed. Default is TRUE.
double chromosome deletion summary table. A data.frame
created by deletionsByBinom
.
a list of known non reliable gene assignments. A list
created by nonReliableVGenes
.
the minimum minor allele fraction to be used as an anchor gene. Default is 0.3
if to only consider genes from single assignment. If true then calls where genes appear with others are discarded. If false then the calls are seperated an counted for all genes that appeared. Default is True.
the IG/TR chain: IGH,IGK,IGL,TRB. Default is IGH.
Function accepts a data.frame
in AIRR format (https://changeo.readthedocs.io/en/stable/standard.html) containing the following columns:
'subject'
: The subject name
'v_call'
: V allele call(s) (in an IMGT format)
'd_call'
: D allele call(s) (in an IMGT format, only for heavy chains)
'j_call'
: J allele call(s) (in an IMGT format)
# Load example data and germlines
data(samples_db, HVGERM, HDGERM)
# Selecting a single individual
clip_db = samples_db[samples_db$subject=='I5', ]
# Infering haplotype
haplo_db = createFullHaplotype(clip_db,toHap_col=c('v_call','d_call'),
hapBy_col='j_call',hapBy='IGHJ6',toHap_GERM=c(HVGERM,HDGERM))
Run the code above in your browser using DataLab