Learn R Programming

easybio (version 1.2.2)

matchCellMarker2: Annotate Clusters by Matching Markers with the CellMarker2.0 Database

Description

This function takes cluster-specific markers, typically from Seurat::FindAllMarkers, and annotates each cluster with potential cell types by matching these markers against a reference database. It first filters and selects the top n marker genes for each cluster based on specified thresholds and then compares them to the reference database to find the most likely cell type annotations.

Usage

matchCellMarker2(
  marker,
  n,
  avg_log2FC_threshold = 0,
  p_val_adj_threshold = 0.05,
  spc,
  tissueClass = available_tissue_class(spc),
  tissueType = available_tissue_type(spc),
  ref = NULL
)

Value

A data.table where each row represents a potential cell type match for a cluster. The table is keyed by cluster and includes columns for cluster, cell_name, uniqueN (number of unique matching markers), N (total matches), ordered_symbol (matching genes, ordered by frequency), and orderN (their frequencies).

The returned object also contains important attributes for downstream analysis:

ref

The reference data (either from cellMarker2 or the custom ref) used for the annotation.

is_custom_ref

A logical flag indicating if a custom ref was used.

filter_args

A list containing the filtering parameters used during the annotation, which is essential for the check_marker function.

Arguments

marker

A data.frame or data.table of markers, usually the output of Seurat::FindAllMarkers. It must contain columns for cluster, gene, avg_log2FC, and p_val_adj.

n

An integer specifying the number of top marker genes to use from each cluster for matching. Genes are ranked by avg_log2FC after filtering.

avg_log2FC_threshold

A numeric value setting the minimum average log2 fold change for a marker to be considered. Defaults to 0.

p_val_adj_threshold

A numeric value setting the maximum adjusted p-value for a marker to be considered. Defaults to 0.05.

spc

A character string specifying the species, either "Human" or "Mouse". This is used to filter the cellMarker2 database. This parameter is ignored if a custom ref is provided.

tissueClass

A character vector of tissue classes to include from the cellMarker2 database. Defaults to all available tissue classes for the specified species. This parameter is ignored if a custom ref is provided. See available_tissue_class().

tissueType

A character vector of tissue types to include from the cellMarker2 database. Defaults to all available tissue types for the specified species. This parameter is ignored if a custom ref is provided. See available_tissue_type().

ref

An optional long data.frame which must contain 'cell_name' and 'marker' columns to be used as the reference for marker matching. If NULL (the default), the function uses the built-in cellMarker2 dataset. When a custom ref is provided, the spc, tissueClass, and tissueType parameters are ignored for the matching process itself, but their original values are saved for provenance.

See Also

check_marker, plotPossibleCell, available_tissue_class, available_tissue_type

Examples

Run this code
if (FALSE) {
library(easybio)
data(pbmc.markers)

# Basic usage: Annotate clusters using the top 50 markers per cluster
matched_cells <- matchCellMarker2(pbmc.markers, n = 50, spc = "Human")
print(matched_cells)

# To see the top annotation for each cluster
top_matches <- matched_cells[, .SD[1], by = cluster]
print(top_matches)

# Advanced usage: Stricter filtering and focus on specific tissues
matched_cells_strict <- matchCellMarker2(
  pbmc.markers,
  n = 30,
  spc = "Human",
  avg_log2FC_threshold = 0.5,
  p_val_adj_threshold = 0.01,
  tissueType = c("Blood", "Bone marrow")
)
print(matched_cells_strict)

# --- Example with a custom reference ---
# Create a custom reference as a named list.
custom_ref_list <- list(
  "T-cell" = c("CD3D", "CD3E"),
  "B-cell" = c("CD79A", "MS4A1"),
  "Myeloid" = "LYZ"
)

# Convert the list to a long data.frame compatible with the 'ref' parameter.
custom_ref_df <- list2dt(custom_ref_list, col_names = c("cell_name", "marker"))

# Run annotation using the custom reference.
# When 'ref' is provided, the internal cellMarker2 database and its filters
# ('spc', 'tissueClass', 'tissueType') are ignored for matching.
matched_custom <- matchCellMarker2(
  pbmc.markers,
  n = 50,
  ref = custom_ref_df
)
print(matched_custom)
}

Run the code above in your browser using DataLab