createFullHaplotype: Anchor gene haplotype inference

Description

The createFullHaplotype functions infers haplotype based on an anchor gene.

Usage

createFullHaplotype(
  clip_db,
  toHap_col = c("v_call", "d_call"),
  hapBy_col = "j_call",
  hapBy = "IGHJ6",
  toHap_GERM = NULL,
  relative_freq_priors = TRUE,
  kThreshDel = 3,
  rmPseudo = TRUE,
  deleted_genes = c(),
  nonReliable_Vgenes = c(),
  min_minor_fraction = 0.3,
  single_gene = TRUE,
  chain = c("IGH", "IGK", "IGL", "TRB")
)

Value

A data.frame, in which each row is the haplotype inference summary of a gene from the column selected in toHap_col.

The output containes the following columns:

subject: the subject name.
gene: the gene name.
Anchor gene allele 1: the haplotype inference for chromosome one. The column name is the anchor gene with the first allele.
Anchor gene allele 2: the haplotype inference for chromosome two. The column name is the anchor gene with the second allele.
alleles: allele calls for the gene.
proirs_row: priors based on relative allele usage of the anchor gene.
proirs_col: priors based on relative allele usage of the inferred gene.
counts1: the appereance count on each chromosome of the first allele from alleles, the counts are seperated by a comma.
k1: the Bayesian factor value for the first allele (from alleles) inference.
counts2: the appereance count on each chromosome of the second allele from alleles, the counts are seperated by a comma.
k2: the Bayesian factor value for the second allele (from alleles) inference.
counts3: the appereance count on each chromosome of the third allele from alleles, the counts are seperated by a comma.
k3: the Bayesian factor value for the third allele (from alleles) inference.
counts4: the appereance count on each chromosome of the fourth allele from alleles, the counts are seperated by a comma.
k4: the Bayesian factor value for the fourth allele (from alleles) inference.

Arguments

clip_db: a data.frame in AIRR format. See details.
toHap_col: a vector of column names for which a haplotype should be inferred. Default is v_call and d_call
hapBy_col: column name of the anchor gene. Default is j_call
hapBy: a string of the anchor gene name. Default is IGHJ6.
toHap_GERM: a vector of named nucleotide germline sequences matching the allele calls in toHap_col columns in clip_db.
relative_freq_priors: if TRUE, the priors for Bayesian inference are estimated from the relative frequencies in clip_db. Else, priors are set to c(0.5,0.5). Default is TRUE
kThreshDel: the minimum lK (log10 of the Bayes factor) to call a deletion. Default is 3.
rmPseudo: if TRUE non-functional and pseudo genes are removed. Default is TRUE.
deleted_genes: double chromosome deletion summary table. A data.frame created by deletionsByBinom.
nonReliable_Vgenes: a list of known non reliable gene assignments. A list created by nonReliableVGenes.
min_minor_fraction: the minimum minor allele fraction to be used as an anchor gene. Default is 0.3
single_gene: if to only consider genes from single assignment. If true then calls where genes appear with others are discarded. If false then the calls are seperated an counted for all genes that appeared. Default is True.
chain: the IG/TR chain: IGH,IGK,IGL,TRB. Default is IGH.

Details

Function accepts a data.frame in AIRR format (https://changeo.readthedocs.io/en/stable/standard.html) containing the following columns:

'subject': The subject name
'v_call': V allele call(s) (in an IMGT format)
'd_call': D allele call(s) (in an IMGT format, only for heavy chains)
'j_call': J allele call(s) (in an IMGT format)

Examples

Run this code

# Load example data and germlines
data(samples_db, HVGERM, HDGERM)

# Selecting a single individual
clip_db = samples_db[samples_db$subject=='I5', ]

# Infering haplotype
haplo_db = createFullHaplotype(clip_db,toHap_col=c('v_call','d_call'),
hapBy_col='j_call',hapBy='IGHJ6',toHap_GERM=c(HVGERM,HDGERM))

Run the code above in your browser using DataLab