Perform (differential) cell type to cell type communication analysis from a Seurat object, using an internal database of ligand-receptor interactions (LRIs). It infers biologically relevant cell-cell interactions (CCIs) and how they change between two conditions of interest. Over-representation analysis is automatically performed to determine dominant differential signals at the level of the genes, cell types, GO Terms and KEGG Pathways.
run_interaction_analysis(
seurat_object,
LRI_species,
seurat_celltype_id,
seurat_condition_id,
iterations = 1000,
scdiffcom_object_name = "scDiffCom_object",
seurat_assay = "RNA",
seurat_slot = "data",
log_scale = FALSE,
score_type = "geometric_mean",
threshold_min_cells = 5,
threshold_pct = 0.1,
threshold_quantile_score = 0.2,
threshold_p_value_specificity = 0.05,
threshold_p_value_de = 0.05,
threshold_logfc = log(1.5),
return_distributions = FALSE,
seed = 42,
verbose = TRUE
)
An S4 object of class scDiffCom-class
.
Seurat object that must contain normalized
data and relevant meta.data
columns (see below). Gene names must be
MGI (mouse) or HGNC (human) approved symbols.
Either "mouse"
, "human"
or "rat"
. Indicates which
LRI database to use and corresponds to the species of the seurat_object
.
Name of the meta.data
column in
seurat_object
that contains cell-type annotations
(e.g.: "CELL_TYPE"
).
List that contains information regarding the two conditions on which to perform differential analysis. Must contain the following three named items:
column_name
: name of the meta.data
column in
seurat_object
that indicates the condition on each cell (e.g. "AGE"
)
cond1_name
: name of the first condition (e.g. "YOUNG"
)
cond2_name
: name of the second condition (e.g. "OLD"
)
Can also be set to NULL
to only perform a detection analysis
(see Details).
Number of permutations to perform the statistical
analysis. The default (1000
) is a good compromise for an exploratory
analysis and to obtain reasonably accurate p-values in a short time.
Otherwise, we recommend using 10000
iterations and to run the
analysis in parallel (see Details). Can also be set to 0
for
debugging and quickly returning partial results without
statistical significance.
Name of the scDiffCom
S4 object that will
be returned ("scDiffCom_object"
by default).
Assay of seurat_object
from which to extract data.
See Details for an explanation on how data are extracted based on the three
parameters seurat_assay
, seurat_slot
and log_scale
.
Slot of seurat_object
from which to extract data.
See Details for an explanation on how data are extracted based on the three
parameters seurat_assay
, seurat_slot
and log_scale
.
When FALSE
(the default, recommended), data are
treated as normalized but not log1p-transformed. See Details for an
explanation on how data are extracted based on the three
parameters seurat_assay
, seurat_slot
and log_scale
.
Metric used to compute cell-cell interaction (CCI) scores.
Can either be "geometric_mean"
(default) or "arithmetic_mean"
.
It is strongly recommended to use the geometric mean, especially when
performing differential analysis. The arithmetic mean might be used when
uniquely doing a detection analysis or if the results want to be compared
with those of another package.
Minimal number of cells - of a given cell type
and condition - required to express a gene for this gene to be considered
expressed in the corresponding cell type. Incidentally, cell types with
less cells than this threshold are removed from the analysis.
Set to 5
by default.
Minimal fraction of cells - of a given cell type
and condition - required to express a gene for this gene to be considered
expressed in the corresponding cell type. Set to 0.1
by default.
Threshold value used in conjunction with
threshold_p_value_specificity
to establish if a CCI is considered
"detected". The default (0.2
) indicates that CCIs with a score
in the 20% lowest-scores are not considered detected. Can be modified
without the need to re-perform the permutation analysis (see Details).
Threshold value used in conjunction
with threshold_quantile_score
to establish if a CCI is considered
"detected". CCIs with a (BH-adjusted) specificity p-value above the
threshold (0.05
by default) are not considered detected. Can be
modified without the need to re-perform the permutation analysis
(see Details).
Threshold value used in conjunction
with threshold_logfc
to establish how CCIs are differentially
expressed between cond1_name
and cond2_name
. CCIs with a
(BH-adjusted) differential p-value above the threshold (0.05
by
default) are not considered to change significantly. Can be modified
without the need to re-perform the permutation analysis (see Details).
Threshold value used in conjunction with
threshold_p_value_de
to establish how CCIs are differentially
expressed between cond1_name
and cond2_name
. CCIs with an
absolute logFC below the threshold (log(1.5)
by default) are
considered "FLAT". Can be modified without the need to
re-perform the permutation analysis (see Details).
FALSE
by default. If TRUE
, the
distributions obtained from the permutation test are returned alongside
the other results. May be used for testing or benchmarking purposes. Can
only be enabled when iterations
is less than 1000
in order
to avoid out of memory issues.
Set a random seed (42
by default) to obtain reproducible
results.
If TRUE
(default), print progress messages.
The primary use of this function (and of the package) is to perform
differential intercellular communication analysis. However, it is also
possible to only perform a detection analysis (by setting
seurat_condition_id
to NULL
), e.g. if one wants to
infer cell-cell interactions from a dataset without having conditions on the cells.
By convention, when performing differential analysis, LOGFC are computed as
log(score(cond2_name)/score(cond1_name))
. In other words,
"UP"-regulated CCIs have a larger score in cond2_name
.
Parallel computing. If possible, it is recommended to
run this function in parallel in order to speed up the analysis for large
dataset and/or to obtain better accuracy on the p-values by setting a higher
number of iterations
. This is as simple as loading the
future
package and setting an appropriate plan
(see also our
vignette).
Data extraction. The UMI or read counts matrix is extracted from
the assay seurat_assay
and the slot seurat_slot
. By default,
it is assumed that seurat_object
contains log1p-transformed
normalized data in the slot "data" of its assay "RNA". If log_scale
is FALSE
(as recommended), the data are expm1()
transformed
in order to recover normalized values not in log scale.
Modifying filtering parameters (differential analysis only). As long as
the slot cci_table_raw
of
the returned scDiffCom object is not erased, filtering parameters can be
modified to recompute the slots cci_table_detected
and
ora_table
, without re-performing the time consuming permutation
analysis. This may be useful if one wants a fast way to analyze how the
results behave in function of, say, different LOGFC thresholds. In practice,
this can be done by calling the functions FilterCCI
or
RunORA
(see also our
vignette).
if (FALSE) {
run_interaction_analysis(
seurat_object = seurat_sample_tms_liver,
LRI_species = "mouse",
seurat_celltype_id = "cell_type",
seurat_condition_id = list(
column_name = "age_group",
cond1_name = "YOUNG",
cond2_name = "OLD"
)
)
}
Run the code above in your browser using DataLab