Computes and extracts the sample distance table for samples
analysed using a familiarEnsemble
object to form a familiarData
object.
This table can be used to cluster samples, and is exported directly by
extract_feature_expression
.
extract_sample_similarity(
object,
data,
cl = NULL,
is_pre_processed = FALSE,
sample_limit = waiver(),
sample_cluster_method = waiver(),
sample_linkage_method = waiver(),
sample_similarity_metric = waiver(),
verbose = FALSE,
message_indent = 0L,
...
)
A data.table containing pairwise distance between samples. This data is only the upper triangular of the complete matrix (i.e. the sparse unitriangular representation). Diagonals will always be 0.0 and the lower triangular is mirrored.
A familiarEnsemble
object, which is an ensemble of one or more
familiarModel
objects.
A dataObject
object, data.table
or data.frame
that
constitutes the data that are assessed.
Cluster created using the parallel
package. This cluster is then
used to speed up computation through parallellisation.
Flag that indicates whether the data was already
pre-processed externally, e.g. normalised and clustered. Only used if the
data
argument is a data.table
or data.frame
.
(optional) Set the upper limit of the number of samples that are used during evaluation steps. Cannot be less than 20.
This setting can be specified per data element by providing a parameter
value in a named list with data elements, e.g.
list("sample_similarity"=100, "permutation_vimp"=1000)
.
This parameter can be set for the following data elements:
sample_similarity
and ice_data
.
The method used to perform clustering based on
distance between samples. These are the same methods as for the
cluster_method
configuration parameter: hclust
, agnes
, diana
and
pam
.
none
cannot be used when extracting data for feature expressions.
If not provided explicitly, this parameter is read from settings used at
creation of the underlying familiarModel
objects.
The method used for agglomerative clustering in
hclust
and agnes
. These are the same methods as for the
cluster_linkage_method
configuration parameter: average
, single
,
complete
, weighted
, and ward
.
If not provided explicitly, this parameter is read from settings used at
creation of the underlying familiarModel
objects.
Metric to determine pairwise similarity
between samples. Similarity is computed in the same manner as for
clustering, but sample_similarity_metric
has different options that are
better suited to computing distance between samples instead of between
features: gower
, euclidean
.
The underlying feature data is scaled to the \([0, 1]\) range (for
numerical features) using the feature values across the samples. The
normalisation parameters required can optionally be computed from feature
data with the outer 5% (on both sides) of feature values trimmed or
winsorised. To do so append _trim
(trimming) or _winsor
(winsorising) to
the metric name. This reduces the effect of outliers somewhat.
If not provided explicitly, this parameter is read from settings used at
creation of the underlying familiarModel
objects.
Flag to indicate whether feedback should be provided on the computation and extraction of various data elements.
Number of indentation steps for messages shown during computation and extraction of various data elements.
Unused arguments.