50% off | Unlimited Data & AI Learning
Get 50% off unlimited learning

piglet (version 1.0.7)

inferGenotypeAllele_asc: Allele similarity cluster based genotype inference Testing function

Description

inferGenotypeAllele_asc infer an individual's genotype based on the allele-base method. The method utilize the allele specific threshold to determine the presence of an allele in the genotype. More specifically, the absolute frequency of each allele is calculated and checked against the threshold.

Usage

inferGenotypeAllele_asc(
  data,
  alleleClusterTable,
  v_call = "v_call",
  single_assignment = FALSE,
  germline_db = NA,
  find_unmutated = FALSE,
  seq = "sequence_alignment",
  confidence_level = NULL,
  default_allele_threshold = 1e-04
)

Value

A a data.frame with the inferred V genotype. The table contains the following columns:

geneallelesimgt_allelescountsabsolute_fractionabsolute_thresholdgenotyped_allelesgenotype_imgt_alleles
allele clusterthe present allelesthe imgt nomenclaturethe number of readsthe absolute fractionthe population driven allelethe alleles whichthe imgt nomenclature
in the repertoireof the allelesfor each allelesof the allelesthresholds for genotype presenceentered the genotypeof the alleles

Arguments

data

data.frame in AIRR format, containing V allele calls from a single subject and the sample IMGT-gapped V(D)J sequences under seq.

alleleClusterTable

A data.frame of the allele similarity clusters thresholds.

v_call

name of the V allele call column. Default is v_call

single_assignment

if TRUE, the method only considers sequence with single assignment for the genotype inference.

germline_db

named vector of sequences containing the germline sequences named in V allele calls and the alleleClusterTable. Only required if find_unmutated is TRUE.

find_unmutated

if TRUE, use germline_db to find which samples are unmutated. Not needed if V allele calls only represent unmutated samples.

seq

name of the column in data with the aligned, IMGT-numbered, V(D)J nucleotide sequence. Default is sequence_alignment.

confidence_level

The confidence level on which to filter the inferred genotype alleles. Default is NULL, meaning filtering only based on allele threshold.

default_allele_threshold

The default allele threshold for the genotype inference, in case the allele threshold is not in the alleleClusterTable. Default is 1e-04.

Details

In naive repertoires, allele calls where more than one assignment is assigned is rare. Hence, in case the data represents the naive repertoire of a subject it is recommended to use the find_unmutated=TRUE option, to remove mutated sequences. For non-naive population, the allele calls in cases of multiple assignment are treated as belonging to all groups.

See Also

inferAlleleClusters will infer the allele clusters based on a supplied V reference set and set the default allele threshold of 1e-04. See recentAlleleClusters to obtain the latest version of the IGHV allele clusters and the naive population based allele threshold.

Examples

Run this code


# loading TIgGER AIRR-seq b cell data
data <- tigger::AIRRDb

# preferably obtain the latest ASC cluster table
# asc_archive <- recentAlleleClusters(doi="10.5281/zenodo.7429773", get_file = TRUE)

# allele_cluster_table <- extractASCTable(archive_file = asc_archive)

# example allele similarity cluster table
data(allele_cluster_table)

data(HVGERM)

# reforming the germline set
asc_germline <- germlineASC(allele_cluster_table, germline = HVGERM)

# assigning the ASC alleles
asc_data <- assignAlleleClusters(data, allele_cluster_table)

# inferring the genotype
asc_genotype <- inferGenotypeAllele_asc(
data = asc_data,
alleleClusterTable = allele_cluster_table,
germline_db = asc_germline, find_unmutated=TRUE)

Run the code above in your browser using DataLab