50% off | Unlimited Data & AI Learning

Last chance! 50% off unlimited learning

Sale ends in


⚠️There's a newer version (0.9.1) of this package.Take me there.

immunarch --- An R Package for Painless Bioinformatics Analysis of T-cell and B-cell Immune Repertoire Data

Why immunarch?

  • Work with any type of data: single-cell, bulk, data tables, databases --- you name it.
  • Community at the heart: thrive in the community of almost 30,000 researchers and medical scientists worldwide, including researchers from Pfizer, Novartis, Regeneron, Stanford, UCSF and MIT.
  • One plot --- one line: write a whole PhD thesis in 8 lines of code or reproduce almost any publication in 5-10 lines of immunarch code.
  • Be on the bleeding edge of science: we regularly update immunarch with the latest methods. Let us know what you need!
  • Automatic format detection and parsing for all popular immunosequencing formats: from MiXCR and ImmunoSEQ to 10XGenomics and ArcherDX.

And the most important: immunarch is not just a tool --- it is an ecosystem.

From Berkeley with devotion

immunarch is brought to you by ImmunoMind --- a UC Berkeley SkyDeck startup.

We have been helping researchers to extract insights from sequencing data of T-cell and antibody repertoires since the inception of the RepSeq domain. Our bioinformatics tools are trusted by top universities including Stanford, UCSF, MIT, King's College London and big pharma companies including Pfizer and Novartis.

Stay connected!

Feel free to follow us on Twitter as well.


Table of Contents

Introduction

immunarch is an R package designed to analyse T-cell receptor (TCR) and B-cell receptor (BCR) repertoires, aimed at medical scientists and bioinformaticians. The mission of immunarch is to make immune sequencing data analysis as effortless as possible and help you focus on research instead of coding. Follow us on Twitter for news and updates.

Contact

Create a ticket with a bug or question on GitHub Issues to help the community help you and enrich it with your experience. If you need to send us a sensitive data, feel free to contact us via support@immunomind.io.

Installation

Latest release on CRAN

In order to install immunarch execute the following command:

install.packages("immunarch")

That's it, you can start using immunarch now! See the Quick Start section below to dive into immune repertoire data analysis. If you run in any trouble with installation, take a look at the Installation Troubleshooting section.

Note: there are quite a lot of dependencies to install with the package because it installs all the widely-used packages for data analysis and visualisation. You got both the AIRR data analysis framework and the full Data Science package ecosystem with only one command, making immunarch the entry-point for single-cell & immune repertoire Data Science.

Latest release on GitHub

If the above command doesn't work for any reason, try installing immunarch directly from its repository:

install.packages("devtools") # skip this if you already installed devtools
devtools::install_github("immunomind/immunarch")

Latest pre-release on GitHub

Since releasing on CRAN is limited to one release per one-two months, you can install the latest pre-release version with bleeding edge features and optimisations directly from the code repository. In order to install the latest pre-release version, you need to execute only two commands:

install.packages("devtools") # skip this if you already installed devtools
devtools::install_github("immunomind/immunarch", ref="develop")

You can find the list of releases of immunarch here: https://github.com/immunomind/immunarch/releases

Features

  1. Fast and easy manipulation of immune repertoire data:

    • The package automatically detects the format of your files---no more guessing what format is that file, just pass them to the package;

    • Supports all popular TCR and BCR analysis and post-analysis formats, including single-cell data: ImmunoSEQ, IMGT, MiTCR, MiXCR, MiGEC, MigMap, VDJtools, tcR, AIRR, 10XGenomics, ArcherDX. More coming in the future;

    • Works on any data source you are comfortable with: R data frames, data tables from data.table, databases like MonetDB, Apache Spark data frames via sparklyr;

    • Tutorial is available here.

  2. Immune repertoire analysis made simple:

    • Most methods are incorporated in a couple of main functions with clear naming---no more remembering tens and tens of functions with obscure names. For details see link;

    • Repertoire overlap analysis (common indices including overlap coefficient, Jaccard index and Morisita's overlap index). Tutorial is available here;

    • Gene usage estimation (correlation, Jensen-Shannon Divergence, clustering). Tutorial is available here;

    • Diversity evaluation (ecological diversity index, Gini index, inverse Simpson index, rarefaction analysis). Tutorial is available here;

    • Tracking of clonotypes across time points, widely used in vaccination and cancer immunology domains. Tutorial is available here;

    • Kmer distribution measures and statistics. Tutorial is available here;

    • Coming in the next releases: CDR3 amino acid physical and chemical properties assessment, mutation networks.

  3. Publication-ready plots with a built-in tool for visualisation manipulation:

    • Rich visualisation procedures with ggplot2;

    • Built-in tool FixVis makes your plots publication-ready: easily change font sizes, text angles, titles, legends and many more with clear-cut GUI;

    • Tutorial is available here.

Quick start

The gist of the typical TCR or BCR data analysis workflow can be reduced to the next few lines of code.

Use immunarch data

1) Load the package and the data

library(immunarch)  # Load the package into R
data(immdata)  # Load the test dataset

2) Calculate and visualise basic statistics

repExplore(immdata$data, "lens") %>% vis()  # Visualise the length distribution of CDR3
repClonality(immdata$data, "homeo") %>% vis()  # Visualise the relative abundance of clonotypes

3) Explore and compare T-cell and B-cell repertoires

repOverlap(immdata$data) %>% vis()  # Build the heatmap of public clonotypes shared between repertoires
geneUsage(immdata$data[[1]]) %>% vis()  # Visualise the V-gene distribution for the first repertoire
repDiversity(immdata$data) %>% vis(.by = "Status", .meta = immdata$meta)  # Visualise the Chao1 diversity of repertoires, grouped by the patient status

Use your own data

library(immunarch)  # Load the package into R
immdata <- repLoad("path/to/your/data")  # Replace it with the path to your data. Immunarch automatically detects the file format.

Advanced methods

For advanced methods such as clonotype annotation, clonotype tracking, kmer analysis and public repertoire analysis see "Tutorials".

Bugs and Issues

The mission of immunarch is to make bulk and single-cell immune repertoires analysis painless. All bug reports, documentation improvements, enhancements and ideas are appreciated. Just let us know via GitHub (preferably) or support@immunomind.io (in case of private data).

Bug reports must:

  1. Include a short, self-contained R snippet reproducing the problem.
  2. Add a minimal data sample for us to reproduce the problem. In case of sensitive data you can send it to support@immunomind.io instead of GitHub issues.
  3. Explain why the current behavior is wrong/not desired and what you expect instead.
  4. If the issue is about visualisations, please attach a picture to the issue. In other case we wouldn't be able to reproduce the bug and fix it.

Help the community

Have an aspiration to help the community build the ecosystem of scRNAseq & AIRR analysis tools? Found a bug? A typo? Would like to improve a documentation, add a method or optimise an algorithm?

We are always open to contributions. There are two ways to contribute:

  1. Create an issue here and describe what would you like to improve or discuss.

  2. Create an issue or find one here, fork the repository and make a pull request with the bugfix or improvement.

Citation

ImmunoMind Team. (2019). immunarch: An R Package for Painless Bioinformatics Analysis of T-Cell and B-Cell Immune Repertoires. Zenodo. http://doi.org/10.5281/zenodo.3367200

BibTex:

@misc{immunomind_team_2019_3367200,
  author       = {{ImmunoMind Team}},
  title        = {{immunarch: An R Package for Painless Bioinformatics Analysis 
                    of T-Cell and B-Cell Immune Repertoires}},
  month        = aug,
  year         = 2019,
  doi          = {10.5281/zenodo.3367200},
  url          = {https://doi.org/10.5281/zenodo.3367200}
}

For EndNote citation import the immunarch-citation.xml file.

Preprint on BioArxiv is coming soon.

License

The package is freely distributed under the AGPL v3 license. You can read more about it here.

For commercial or server use, please contact ImmunoMind via support@immunomind.io about solutions for biomarker data science of single-cell immune repertoires.

Copy Link

Version

Install

install.packages('immunarch')

Monthly Downloads

779

Version

0.6.4

License

AGPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Vadim Nazarov

Last Published

March 18th, 2024

Functions in immunarch (0.6.4)

fixVis

Manipulate ggplot plots and make publication-ready plots
immunr_data_format

Specification of the data format used by immunarch dataframes
immdata

Immune repertoire dataset
geneUsage

Main function for estimation of V-gene and J-gene statistics
pubRep

Create a repertoire of public clonotypes
getKmers

Calculate the kmer tatistics of immune repertoires
pubRepApply

Apply transformations to public repertoires
gene_stats

WIP
repLoad

Load immune repertoire files into the R workspace
aa_table

Amino acid / codon table
group_from_metadata

Get a character vector of samples' groups from the input metadata file
repOverlap

Main function for public clonotype statistics calculations
geneUsageAnalysis

Post-analysis of V-gene and J-gene statistics: PCA, clustering, etc.
has_class

Check for the specific class
entropy

Information measures
switch_type

Return a column's name
matrixdiagcopy

Copy the upper matrix triangle to the lower one
filter_barcodes

Filter clonotypes using barcodes from single-cell metadata
repDiversity

Main function for immune repertoire diversity estimation
inc_overlap

Incremental counting of repertoire similarity
gene_segments

Gene segments table
pubRepStatistics

Statistics of number of public clonotypes for each possible combinations of repertoires
pubRepFilter

Filter out clonotypes from public repertoires
repSave

Save immune repertoires to the disk
immunr_hclust

Clustering of objects or distance matrices
immunr_pca

Dimensionality reduction
top

Get the N most abundant clonotypes
public_matrix

Get a matrix with public clonotype frequencies
vis_heatmap

Visualisation of matrices and data frames using ggplo2-based heatmaps
split_to_kmers

Analysis immune repertoire kmer statistics: sequence profiles, etc.
vis.immunr_kmer_table

Most frequent kmers visualisation.
spectratype

Immune repertoire spectratyping
vis_heatmap2

Visualisation of matrices using pheatmap-based heatmaps
vis.immunr_mds

PCA / MDS / tSNE visualisation (mainly overlap / gene usage)
repExplore

Main function for exploratory data analysis: compute the distribution of lengths, clones, etc.
vis.immunr_dynamics

Visualise clonotype dynamics
repClonality

Clonality analysis of immune repertoires
vis.immunr_exp_vol

Visualise results of the exploratory analysis
vis.immunr_ov_matrix

Repertoire overlap and gene usage visualisations
vis_hist

Visualisation of distributions using histograms
vis_immunr_kmer_profile_main

Visualise kmer profiles
vis.immunr_public_repertoire

Public repertoire visualisation
repSample

Downsampling and resampling of immune repertoires
vis

One function to visualise them all
vis.immunr_gene_usage

Histograms and boxplots (general case / gene usage)
repOverlapAnalysis

Post-analysis of public clonotype statistics: PCA, clustering, etc.
vis.immunr_hclust

Visualisation of hierarchical clustering
trackClonotypes

Track clonotypes across time and data points
set_pb

Set and update progress bars
vis.immunr_kmeans

Visualisation of K-means and DBSCAN clustering
vis.immunr_chao1

Visualise diversity.
vis.immunr_inc_overlap

Visualise incremental overlaps
vis_public_clonotypes

Visualisation of public clonotypes
vis.immunr_public_statistics

Visualise sharing of clonotypes among samples
vis_box

Flexible box-plots for visualisation of distributions
vis_bar

Bar plots
vis.immunr_clonal_prop

Visualise results of the clonality analysis
vis_circos

Visualisation of matrices using circos plots
vis_textlogo

Sequence logo plots for amino acid profiles.
vis_treemap

Visualisation of data frames and matrices using treemaps
vis_public_frequencies

Public repertoire visualisation
aa_properties

Tables with amino acid properties
coding

Filter out coding and non-coding clonotype sequences
dbAnnotate

Annotate clonotypes in immune repertoires using clonotype databases such as VDJDB and MCPAS
add_class

Add a new class attribute
bunch_translate

Nucleotide to amino acid sequence translation
apply_symm

Apply function to each pair of data frames from a list.
check_distribution

Check and normalise distributions
dbLoad

Load clonotype databases such as VDJDB and McPAS into the R workspace
.quant_column_choice

Get a column's name using the input alias