getSharedClones: Compute a list of clonotypes that are shared between seurat clusters

Description

This function allows users to get a list of clonotypes that are shared between clusters based on the levels of the active cell identities / some custom identity based on the alt_ident. A list is returned with its names being the shared clonotypes, and the values are numeric vectors indicating the index of the clusters that clonotype is found in. The index corresponds to the index in the default levels of the factored identities.

If run_id is inputted, then the function will attempt to get the shared clonotypes from the corresponding APackOfTheClones run generated from RunAPOTC(). Otherwise, it will use the filtering / subsetting parameters to generate the shared clones.

Usage

getSharedClones(
  seurat_obj,
  reduction_base = "umap",
  clonecall = "strict",
  ...,
  extra_filter = NULL,
  alt_ident = NULL,
  run_id = NULL,
  top = NULL,
  top_per_cl = NULL,
  intop = NULL,
  intop_per_cl = NULL,
  publicity = c(2L, Inf)
)

Value

a named list where each name is a clonotype, each element is a numeric indicating which seurat cluster(s) its in, in no particular order. If no shared clones are present, the output is an empty list.

Arguments

seurat_obj

Seurat object with one or more dimension reductions and already have been integrated with a TCR/BCR library with scRepertoire::combineExpression.

reduction_base

character. The seurat reduction to base the clonal expansion plotting on. Defaults to 'umap' but can be any reduction present within the reductions slot of the input seurat object, including custom ones. If `'pca'``, the cluster coordinates will be based on PC1 and PC2. However, generally APackOfTheClones is used for displaying UMAP and occasionally t-SNE versions to intuitively highlight clonal expansion.

clonecall

character. The column name in the seurat object metadata to use. See scRepertoire documentation for more information about this parameter that is central to both packages.

...

additional "subsetting" keyword arguments indicating the rows corresponding to elements in the seurat object metadata that should be filtered by. E.g., seurat_clusters = c(1, 9, 10) will filter the cells to those in the seurat_clusters column with any of the values 1, 9, and 10. Unfortunately, column names in the seurat object metadata cannot conflict with the keyword arguments. MAJOR NOTE if any subsetting keyword arguments are a prefix of any preceding argument names (e.g. a column named reduction is a prefix of the reduction_base argument) R will interpret it as the same argument unless both arguments are named. Additionally, this means any subsequent arguments must be named.

extra_filter

character. An additional string that should be formatted exactly like a statement one would pass into dplyr::filter that does additional filtering to cells in the seurat object - on top of the other keyword arguments - based on the metadata. This means that it will be logically AND'ed with any keyword argument filters. This is a more flexible alternative / addition to the filtering keyword arguments. For example, if one wanted to filter by the length of the amino acid sequence of TCRs, one could pass in something like extra_filter = "nchar(CTaa) - 1 > 10". When involving characters, ensure to enclose with single quotes.

alt_ident

character. By default, cluster identity is assumed to be whatever is in Idents(seurat_obj), and clones will be grouped by the active ident. However, alt_ident could be set as the name of some column in the meta data of the seurat object to be grouped by. This column is meant to have been a product of Seurat::StashIdent or manually added.

run_id

character. This will be the ID associated with the data of a run, and will be used by other important functions like APOTCPlot() and AdjustAPOTC. Defaults to NULL, in which case the ID will be generated in the following format:

reduction_base;clonecall;keyword_arguments;extra_filter

where if keyword arguments and extra_filter are underscore characters if there was no input for the ... and extra_filter parameters.

top

integer or numeric in (0, 1) - if not null, filters the output clones so that only the shared clonotypes with counts the top top count / proportion (for numeric in (0, 1) input) shared clones are kept. For cases where several clonotypes tie in size, the clonotype(s) added are not guaranteed but deterministic given the other arguments are identical.

top_per_cl

integer or numeric in (0, 1) - if not null, filters the output clones so that for each seurat cluster, only the clonotypes with the top_per_cl frequency/count is preserved when aggregating shared clones, in the same way as the above. Note that if inputted in conjunction with top, it will get the intersection of the clonotypes filtered each way. For cases where several clonotypes tie in size, the clonotype(s) added are not guaranteed but deterministic given the other arguments are identical.

intop

integer or numeric in (0, 1) - if not null, filters the raw clone sizes before computing the shared clonotypes so that only the clonotypes that have their overall size in the top intop largest sizes (if it is integer, else the intop proportion) are kept. To emphasize, this argument does not necessarily return the top shared clones and likely a little less, because this filters the raw clone sizes, of which, its very likely that not all those clones end up being shared.

intop_per_cl

integer or numeric in (0, 1) - if not null, filters the raw clustered clone sizes before computing shared clones, so that for every clone in a seurat cluster, the top intop_per_cl count / proportion (for numeric in (0, 1) input) clones are kept.

publicity

numeric pair. A simple filter range of c(lowerbound, upperbound) to retain only shared clones with their "publicity" - number of clusters they are present in - within this range.

Examples

Run this code

data("combined_pbmc")

getSharedClones(combined_pbmc)

getSharedClones(
    combined_pbmc,
    orig.ident = c("P17B", "P18B"), # a named subsetting parameter
    clonecall = "aa"
)

# extract shared clones from a past RunAPOTC run
combined_pbmc <- RunAPOTC(
    combined_pbmc, run_id = "foo", verbose = FALSE
)

getSharedClones(
    combined_pbmc, run_id = "foo", top = 5
)

# doing a run and then getting the clones works too
combined_pbmc <- RunAPOTC(combined_pbmc, run_id = "run1", verbose = FALSE)
getSharedClones(combined_pbmc, run_id = "run1")

Run the code above in your browser using DataLab