Learn R Programming

Alakazam

Alakazam is part of the Immcantation analysis framework for Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) and provides a set of tools to investigate lymphocyte receptor clonal lineages, diversity, gene usage, and other repertoire level properties, with a focus on high-throughput immunoglobulin (Ig) sequencing.

Alakazam serves five main purposes:

  1. Providing core functionality for other R packages in the Immcantation framework. This includes common tasks such as file I/O, basic DNA sequence manipulation, and interacting with V(D)J segment and gene annotations.
  2. Providing an R interface for interacting with the output of the pRESTO and Change-O tool suites.
  3. Performing clonal abundance and diversity analysis on lymphocyte repertoires.
  4. Performing lineage reconstruction on clonal populations of Ig sequences and analyzing the topology of the resultant lineage trees.
  5. Performing physicochemical property analyses of lymphocyte receptor sequences.

Contact

If you need help or have any questions, please contact the Immcantation Group.

If you have discovered a bug or have a feature request, you can open an issue using the issue tracker.

To receive alerts about Immcantation releases, news, events, and tutorials, join the Immcantation News Google Group. Membership settings can be adjusted to change the frequency of email updates.

Copy Link

Version

Install

install.packages('alakazam')

Monthly Downloads

1,267

Version

1.4.0

License

AGPL-3

Maintainer

Susanna Marquez

Last Published

September 25th, 2025

Functions in alakazam (1.4.0)

MRCATest-class

S4 class defining edge significance
SingleDb

Single sequence AIRR database
bulk

Calculates the average bulkiness of amino acid sequences
checkColumns

Check data.frame for valid columns and issue message if invalid
collapseDuplicates

Remove duplicate DNA sequences and combine annotations
calcDiversity

Calculate the diversity index
countClones

Tabulates clones sizes
charge

Calculates the net charge of amino acid sequences.
combineIgphyml

Combine IgPhyML object parameters into a dataframe
countGenes

Tabulates V(D)J allele, gene or family usage within each locus.
buildPhylipLineage

Infer an Ig lineage using PHYLIP
calcCoverage

Calculate sample coverage
getPositionQuality

Get a data.frame with sequencing qualities per position
estimateAbundance

Estimates the complete clonal relative abundance distribution
countPatterns

Count sequence patterns
getMRCA

Retrieve the first non-root node of a lineage tree
getAAMatrix

Build an AA distance matrix
cpuCount

Available CPU cores
getPathLengths

Calculate path lengths from the tree root
extractVRegion

Extracts FWRs and CDRs from IMGT-gapped sequences
getSegment

Get Ig segment allele, gene and family names
getDNAMatrix

Build a DNA distance matrix
graphToPhylo

Convert a tree in igraph graph format to ape phylo format.
maskSeqEnds

Masks ragged leading and trailing edges of aligned DNA sequences
makeTempDir

Create a temporary folder
isValidAASeq

Validate amino acid sequences
maskPositionsByQuality

Mask sequence positions with low quality
groupGenes

Group sequences by gene assignment
junctionAlignment

Calculate junction region alignment properties
gravy

Calculates the hydrophobicity of amino acid sequences
makeChangeoClone

Generate a ChangeoClone object for lineage construction
phyloToGraph

Convert a tree in ape phylo format to igraph graph format.
plotAbundanceCurve

Plot a clonal abundance distribution
pairwiseEqual

Calculate pairwise equivalence between sequences
permuteLabels

Permute the node labels of a tree
plotDiversityTest

Plot the results of diversity testing
pairwiseDist

Calculate pairwise distances between sequences
plotDiversityCurve

Plot the results of alphaDiversity
padSeqEnds

Pads ragged ends of aligned DNA sequences
nonsquareDist

Calculate pairwise distances between sequences
gridPlot

Plot multiple ggplot objects
maskSeqGaps

Masks gap characters in DNA sequences
rarefyDiversity

Generate a clonal diversity index curve
progressBar

Standard progress bar
readFastqDb

Load sequencing quality scores from a FASTQ file
polar

Calculates the average polarity of amino acid sequences
seqDist

Calculate distance between two sequences
readIgphyml

Read in output from IgPhyML
readChangeoDb

Read a Change-O tab-delimited database file
plotEdgeTest

Plot the results of an edge permutation test
plotMRCATest

Plot the results of a founder permutation test
plotSubtrees

Plots subtree statistics for multiple trees
tableEdges

Tabulate the number of edges between annotations within a lineage tree
translateStrings

Translate a vector of strings
testMRCA

Tests for MRCA annotation enrichment in lineage trees
testEdges

Tests for parent-child annotation enrichment in lineage trees
sortGenes

Sort V(D)J genes
testDiversity

Pairwise test of the diversity index
seqEqual

Test DNA sequences for equality.
stoufferMeta

Weighted meta-analysis of p-values via Stouffer's method
summarizeSubtrees

Generate subtree summary statistics for a tree
translateDNA

Translate nucleotide sequences to amino acids
writeChangeoDb

Write a Change-O tab-delimited database file
AbundanceCurve-class

S4 class defining a clonal abundance curve
ChangeoClone-class

S4 class defining a clone
ABBREV_AA

Amino acid abbreviation translations
EdgeTest-class

S4 class defining edge significance
DiversityCurve-class

S4 class defining a diversity curve
baseTheme

Standard ggplot settings
alphaDiversity

Calculate clonal alpha diversity
aliphatic

Calculates the aliphatic index of amino acid sequences
aminoAcidProperties

Calculates amino acid chemical properties for sequence data
alakazam

The Alakazam package
DEFAULT_COLORS

Default colors
IMGT_REGIONS

IMGT V-segment regions
ExampleDbChangeo

Example Change-O database
alakazam-package

alakazam: Immunoglobulin Clonal Lineage and Diversity Analysis
ExampleDb

Example AIRR database
ExampleTrees

Example Ig lineage trees
Example10x

Small example 10x Genomics Ig V(D)J sequences from CD19+ B cells isolated from PBMCs of a healthy human donor. Down-sampled from data provided by 10x Genomics under a Creative Commons Attribute license, and processed with their Cell Ranger pipeline.
IUPAC_CODES

IUPAC ambiguous characters