Learn R Programming

piglet (version 1.0.7)

inferGenotypeAllele: Allele based genotype inference

Description

inferGenotypeAllele infer an individual's genotype based on the allele-base method. The method utilize the allele specific threshold to determine the presence of an allele in the genotype. More specifically, based on the allele frequency, repertoire depth, and the specific allele threshold, a confidence level (Z score) is calculated for the presence of the allele in the genotype. The user can select the confidence level for the genotype inference.

Usage

inferGenotypeAllele(
  data,
  allele_threshold_table = NULL,
  call = "v_call",
  asc_annotation = FALSE,
  single_assignment = FALSE,
  translate_to_asc = FALSE,
  germline_db = NA,
  find_unmutated = FALSE,
  seq = "sequence_alignment",
  default_allele_threshold = 1e-04,
  quiet = TRUE
)

Value

A a data.frame with the inferred V genotype. The table contains the following columns:

  • allele: The alleles in the allele_threshold_table.

  • counts: The number of reads for each alleles.

  • depth: The total number of reads in the genotype (Sum of counts).

  • threshold: The population driven allele thresholds for genotype presence.

  • z_score: The confidence level for the presence of the allele in the genotype.

  • asc_allele: If translate_to_asc is true, the asc allele value from allele_threshold_table.

Arguments

data

data.frame in AIRR format, containing allele calls from a single subject and the sample IMGT-gapped V(D)J sequences under seq.

allele_threshold_table

A data.frame of the alleles and their thresholds.

call

name of the V,D, or J allele call column, i.e v_call, d_call, j_call. Default is v_call

asc_annotation

Logical (FALSE by default). Are the allele calls annotated with the allele similarity clusters.

single_assignment

if TRUE, the method only considers sequence with single assignment for the genotype inference.

translate_to_asc

For V allele calls, collapse identical allele for the genotype inference. Default is FALSE.

germline_db

named vector of sequences containing the germline sequences named in V allele calls and the alleleClusterTable. Only required if find_unmutated is TRUE.

find_unmutated

if TRUE, use germline_db to find which samples are unmutated. Not needed if V allele calls only represent unmutated samples.

seq

name of the column in data with the aligned, IMGT-numbered, V(D)J nucleotide sequence. Default is sequence_alignment.

default_allele_threshold

The default allele threshold for the genotype inference, in case the allele threshold is not in the allele_threshold_table. Default is 1e-04.

quiet

Logical (TRUE by default). Do you want to suppress informative messages

Details

In naive repertoires, allele calls where more than one assignment is assigned is rare. Hence, in case the data represents the naive repertoire of a subject it is recommended to use the find_unmutated=TRUE option, to remove mutated sequences. For non-naive population, the allele calls in cases of multiple assignment are treated as belonging to all groups.

See Also

inferAlleleClusters will infer the allele clusters based on a supplied V reference set and set the default allele threshold of 1e-04. See recentAlleleClusters to obtain the latest version of the IGHV allele clusters and the naive population based allele threshold.

Examples

Run this code


# loading TIgGER AIRR-seq b cell data
data <- tigger::AIRRDb

# allele threshold table
data(allele_threshold_table)

data(HVGERM)

# inferring the genotype
genotype <- inferGenotypeAllele(
data = data,
allele_threshold_table = allele_threshold_table,
germline_db = HVGERM, find_unmutated=TRUE)

# filter alleles with z_score >= 0 

head(genotype[genotype$z_score >= 0,])

Run the code above in your browser using DataLab