Perform (differential) cell type to cell type communication analysis from a Seurat object, using an internal database of ligand-receptor interactions (LRIs). It infers biologically relevant cell-cell interactions (CCIs) and how they change between two conditions of interest. Over-representation analysis is automatically performed to determine dominant differential signals at the level of the genes, cell types, GO Terms and KEGG Pathways.
run_interaction_analysis(
seurat_object,
LRI_species,
seurat_celltype_id,
seurat_condition_id,
iterations = 1000,
scdiffcom_object_name = "scDiffCom_object",
seurat_assay = "RNA",
seurat_layer = "data",
seurat_slot = deprecated(),
log_scale = FALSE,
score_type = "geometric_mean",
threshold_min_cells = 5,
threshold_pct = 0.1,
threshold_quantile_score = 0.2,
threshold_p_value_specificity = 0.05,
threshold_p_value_de = 0.05,
threshold_logfc = log(1.5),
return_distributions = FALSE,
seed = 42,
verbose = TRUE,
custom_LRI_tables = NULL
)An S4 object of class scDiffCom-class.
Seurat object that must contain normalized
data and relevant meta.data columns (see below). Gene names must be
MGI (mouse) or HGNC (human) approved symbols.
Either "mouse", "human", "rat" or "custom".
Indicates which LRI database to use and corresponds to the species of the seurat_object.
Use "custom" at your own risk to use your own LRI table (see custom_LRI_tables).
Name of the meta.data column in
seurat_object that contains cell-type annotations
(e.g.: "CELL_TYPE").
List that contains information regarding the two conditions on which to perform differential analysis. Must contain the following three named items:
column_name: name of the meta.data column in
seurat_object that indicates the condition on each cell (e.g. "AGE")
cond1_name: name of the first condition (e.g. "YOUNG")
cond2_name: name of the second condition (e.g. "OLD")
Can also be set to NULL to only perform a detection analysis
(see Details).
Number of permutations to perform the statistical
analysis. The default (1000) is a good compromise for an exploratory
analysis and to obtain reasonably accurate p-values in a short time.
Otherwise, we recommend using 10000 iterations and to run the
analysis in parallel (see Details). Can also be set to 0 for
debugging and quickly returning partial results without
statistical significance.
Name of the scDiffCom S4 object that will
be returned ("scDiffCom_object" by default).
Assay of seurat_object from which to extract data.
See Details for an explanation on how data are extracted based on the three
parameters seurat_assay, seurat_layer and log_scale.
Layer of seurat_object from which to extract data.
See Details for an explanation on how data are extracted based on the three
parameters seurat_assay, seurat_layer and log_scale.
`r lifecycle::badge("deprecated")` `seurat_slot` is no longer supported; use `seurat_layer` instead.
When FALSE (the default, recommended), data are
treated as normalized but not log1p-transformed. See Details for an
explanation on how data are extracted based on the three
parameters seurat_assay, seurat_layer and log_scale.
Metric used to compute cell-cell interaction (CCI) scores.
Can either be "geometric_mean" (default) or "arithmetic_mean".
It is strongly recommended to use the geometric mean, especially when
performing differential analysis. The arithmetic mean might be used when
uniquely doing a detection analysis or if the results want to be compared
with those of another package.
Minimal number of cells - of a given cell type
and condition - required to express a gene for this gene to be considered
expressed in the corresponding cell type. Incidentally, cell types with
less cells than this threshold are removed from the analysis.
Set to 5 by default.
Minimal fraction of cells - of a given cell type
and condition - required to express a gene for this gene to be considered
expressed in the corresponding cell type. Set to 0.1 by default.
Threshold value used in conjunction with
threshold_p_value_specificity to establish if a CCI is considered
"detected". The default (0.2) indicates that CCIs with a score
in the 20% lowest-scores are not considered detected. Can be modified
without the need to re-perform the permutation analysis (see Details).
Threshold value used in conjunction
with threshold_quantile_score to establish if a CCI is considered
"detected". CCIs with a (BH-adjusted) specificity p-value above the
threshold (0.05 by default) are not considered detected. Can be
modified without the need to re-perform the permutation analysis
(see Details).
Threshold value used in conjunction
with threshold_logfc to establish how CCIs are differentially
expressed between cond1_name and cond2_name. CCIs with a
(BH-adjusted) differential p-value above the threshold (0.05 by
default) are not considered to change significantly. Can be modified
without the need to re-perform the permutation analysis (see Details).
Threshold value used in conjunction with
threshold_p_value_de to establish how CCIs are differentially
expressed between cond1_name and cond2_name. CCIs with an
absolute logFC below the threshold (log(1.5) by default) are
considered "FLAT". Can be modified without the need to
re-perform the permutation analysis (see Details).
FALSE by default. If TRUE, the
distributions obtained from the permutation test are returned alongside
the other results. May be used for testing or benchmarking purposes. Can
only be enabled when iterations is less than 1000 in order
to avoid out of memory issues.
Set a random seed (42 by default) to obtain reproducible
results.
If TRUE (default), print progress messages.
A list containing a LRI table and, if known,
tables with annotations supplied by the user. Overwrite
LRI_species and the corresponding internal LRI table. Use to
your own risk! Must contain at least the following named item:
LRI: a data.table of LRIs
The data.table of LRIs must be in the same format as the internal LRI_tables, namely with the columns "LRI", "LIGAND_1", "LIGAND_2", "RECEPTOR_1", "RECEPTOR_2", "RECEPTOR_3". Other named data.tables can be supplied for over-representation analysis (ORA) purposes.
The primary use of this function (and of the package) is to perform
differential intercellular communication analysis. However, it is also
possible to only perform a detection analysis (by setting
seurat_condition_id to NULL), e.g. if one wants to
infer cell-cell interactions from a dataset without having conditions on the cells.
By convention, when performing differential analysis, LOGFC are computed as
log(score(cond2_name)/score(cond1_name)). In other words,
"UP"-regulated CCIs have a larger score in cond2_name.
Parallel computing. If possible, it is recommended to
run this function in parallel in order to speed up the analysis for large
dataset and/or to obtain better accuracy on the p-values by setting a higher
number of iterations. This is as simple as loading the
future
package and setting an appropriate plan (see also our
vignette).
Data extraction. The UMI or read counts matrix is extracted from
the assay seurat_assay and the layer seurat_layer. By default,
it is assumed that seurat_object contains log1p-transformed
normalized data in the layer "data" of its assay "RNA". If log_scale
is FALSE (as recommended), the data are expm1() transformed
in order to recover normalized values not in log scale.
Modifying filtering parameters (differential analysis only). As long as
the slot cci_table_raw of
the returned scDiffCom object is not erased, filtering parameters can be
modified to recompute the slots cci_table_detected and
ora_table, without re-performing the time consuming permutation
analysis. This may be useful if one wants a fast way to analyze how the
results behave in function of, say, different LOGFC thresholds. In practice,
this can be done by calling the functions FilterCCI or
RunORA (see also our
vignette).
if (FALSE) {
run_interaction_analysis(
seurat_object = seurat_sample_tms_liver,
LRI_species = "mouse",
seurat_celltype_id = "cell_type",
seurat_condition_id = list(
column_name = "age_group",
cond1_name = "YOUNG",
cond2_name = "OLD"
)
)
}
Run the code above in your browser using DataLab