Learn R Programming

metaSNF: Meta clustering with Similarity Network Fusion

Brief Overview

metaSNF is an R package that facilitates usage of the meta clustering paradigm described in (Caruana et al. 2006) with the similarity network fusion (SNF) data integration procedure developed in (Wang et al. 2014). The package offers a comprehensive suite of tools to assist users in transforming multi-modal tabular data into cluster solutions, decision making in the clustering process, and visualization along the way with a strong emphasis on context-specific utility and principled validation of results.

Installation

You will need R version 4.1.0 or higher to install this package. metaSNF can be installed from CRAN:

install.packages("metasnf")

Development versions can be installed from GitHub:

# Latest development version
devtools::install_github("BRANCHlab/metasnf")

# Install a specific tagged version
devtools::install_github("BRANCHlab/metasnf@v2.1.1")

Quick Start

Minimal usage of the package looks like this:

# Load the package
library(metasnf)

# Setting up the data
dl <- data_list(
    list(abcd_cort_t, "cortical_thickness", "neuroimaging", "continuous"),
    list(abcd_cort_sa, "cortical_surface_area", "neuroimaging", "continuous"),
    list(abcd_subc_v, "subcortical_volume", "neuroimaging", "continuous"),
    list(abcd_income, "household_income", "demographics", "continuous"),
    list(abcd_pubertal, "pubertal_status", "demographics", "continuous"),
    uid = "patient"
)
#> ℹ 176 observations dropped due to incomplete data.

# Specifying 5 different sets of settings for SNF
set.seed(42)
sc <- snf_config(
    dl,
    n_solutions = 5,
    max_k = 40
)
#> ℹ No distance functions specified. Using defaults.
#> ℹ No clustering functions specified. Using defaults.

# This matrix has clustering solutions for each of the 5 SNF runs!
sol_df <- batch_snf(dl, sc)

sol_df
#> 5 cluster solutions of 100 observations:
#> solution nclust mc uid_NDAR_INV0567T2Y9 uid_NDAR_INV0IZ157F8 
#>        1      2  .                    1                    2 
#>        2      2  .                    2                    1 
#>        3     10  .                    1                    9 
#>        4      2  .                    2                    1 
#>        5      8  .                    1                    7 
#> 98 observations not shown.
#> Use `print(n = ...)` to change the number of rows printed.
#> Use `t()` to view compact cluster solution format.

t(sol_df)
#> 5 cluster solutions of 100 observations:
#>                  uid      s1    s2    s3    s4    s5 
#> uid_NDAR_INV0567T2Y9       1     2     1     2     1
#> uid_NDAR_INV0IZ157F8       2     1     9     1     7
#> uid_NDAR_INV0J4PYA5F       1     2     7     2     3
#> uid_NDAR_INV10OMKVLE       2     1     8     1     4
#> uid_NDAR_INV15FPCW4O       1     2     2     2     5
#> uid_NDAR_INV19NB4RJK       1     2     9     2     7
#> uid_NDAR_INV1HLGR738       1     2     9     2     7
#> uid_NDAR_INV1KR0EZFU       2     1     9     2     7
#> uid_NDAR_INV1L3Y9EOP       1     2     2     2     5
#> uid_NDAR_INV1TCP5GNM       1     2     8     2     4
#> 90 observations not shown.

Check out the tutorial vignettes below to learn about how the package can be used:

And more tutorials can be found under the “articles” section of the documentation home page: https://branchlab.github.io/metasnf/index.html

Background

Why use meta clustering?

Clustering algorithms seek solutions where members of the same cluster are very similar to each other and members of distinct clusters are very dissimilar to each other. In sufficiently noisy datasets where many qualitatively distinct solutions with similar scores of clustering quality exist, it is not necessarily the case that the top solution selected by a clustering algorithm will also be the most useful one for the user’s context.

To address this issue, the original meta clustering procedure Caruana et al., 2006 involved generating a large number of reasonable clustering solutions, clustering those solutions into qualitatively similar ones, and having the user examine those “meta clusters” to find something that seems like it’ll be the most useful.

Why use SNF?

In the clinical data setting, we often have access to patient data across a wide range of domains, such as imaging, genetics, biomarkers, demographics. When trying to extract subtypes out of all this information, direct concatenation of the data followed by cluster analysis can result in a substantial amount of lost (valuable) signal contained in each individual domain. Empirically, SNF has been demonstrated to effectively integrate highly diverse patient data for the purposes of clinical subtyping.

Documentation

Example workflows

Essential objects

Further customization of generated solutions

Additional functionality

Plotting

References

Caruana, Rich, Mohamed Elhawary, Nam Nguyen, and Casey Smith. 2006. “Meta Clustering.” In Sixth International Conference on Data Mining (ICDM’06), 107–18. https://doi.org/10.1109/ICDM.2006.103.

Wang, Bo, Aziz M. Mezlini, Feyyaz Demir, Marc Fiume, Zhuowen Tu, Michael Brudno, Benjamin Haibe-Kains, and Anna Goldenberg. 2014. “Similarity Network Fusion for Aggregating Data Types on a Genomic Scale.” Nature Methods 11 (3): 333–37. https://doi.org/10.1038/nmeth.2810.

Copy Link

Version

Install

install.packages('metasnf')

Monthly Downloads

230

Version

2.1.2

License

GPL (>= 3)

Issues

Pull Requests

Stars

Forks

Maintainer

Prashanth S Velayudhan

Last Published

April 28th, 2025

Functions in metasnf (2.1.2)

as.data.frame.data_list

Coerce a data_list class object into a data.frame class object
anxiety

Mock ABCD anxiety data
age_df

Mock age data
as.data.frame.settings_df

Coerce a settings_df class object into a data.frame class object
as.data.frame.snf_config

Coerce a settings_df class object into a data.frame class object
alluvial_cluster_plot

Alluvial plot of patients across cluster counts and important features
add_settings_df_rows

Add rows to a settings_df
as.list.clust_fns_list

Coerce a clust_fns_list class object into a list class object
as.list.sim_mats_list

Coerce a sim_mats_list class object into a list class object
adjusted_rand_index_heatmap

Heatmap of pairwise adjusted rand indices between solutions
as_sim_mats_list

Convert an object to a similarity matrix list
as.list.dist_fns_list

Coerce a dist_fns_list class object into a list class object
as.data.frame.t_solutions_df

Coerce a t_solutions_df class object into a data.frame class object
as_data_list

Convert an object to a data list
as.data.frame.weights_matrix

Coerce a weights_matrix class object into a data.frame class object
as.matrix.ari_matrix

Coerce a ari_matrix class object into a matrix class object
as.data.frame.ext_solutions_df

Coerce a ext_solutions_df class object into a data.frame class object
assoc_pval_heatmap

Heatmap of pairwise associations between features
arrange_dll

Sort data frames in a data list by their unique ID values
as_snf_config

Convert an object to a snf config
as.list.snf_config

Coerce a snf_config class object into a list class object
as_weights_matrix

Convert an object to a weights matrix
calc_assoc_pval_matrix

Calculate p-values for all pairwise associations of features in a data list
auto_plot

Automatically plot features across clusters
calc_assoc_pval

Calculate p-values based on feature vectors and their types
assemble_data

Collapse a data frame and/or a data list into a single data frame
as.data.frame.solutions_df

Coerce a solutions_df class object into a data.frame class object
as.data.frame.t_ext_solutions_df

Coerce a t_ext_solutions_df class object into a data.frame class object
calc_aris

Construct an ARI matrix storing inter-solution similarities
cache_a_complete_example_sol_df

Cached example solutions data frame
as_settings_df

Convert an object to a settings data frame
as.list.data_list

Coerce a data_list class object into a list class object
as.matrix.weights_matrix

Coerce a weights_matrix class object into a matrix class object
cell_significance_fn

Place significance stars on ComplexHeatmap cells
char_to_fac

Convert character-type columns of a data frame to factor-type
calculate_coclustering

Calculate co-clustering data
batch_snf_subsamples

Run SNF clustering pipeline on a list of subsampled data lists
batch_snf

Run variations of SNF
calc_nmis

Calculate feature NMIs for a data list and a solutions data frame
cancer_diagnosis_df

Mock diagnosis data
cat_colours

Helper function for generating categorical colour palette
check_cfll_fn_args

Check if functions in a distance metrics list-like have valid arguments
check_cfll_named

Check if clustering functions list-like object has named algorithms
check_dfll_fn_names

Check if functions in a distance metrics list-like have names
check_dll_types

Error if data list-like structure has invalid feature types
check_dfll_fn_args

Check if functions in a distance metrics list-like have valid arguments
bar_plot

Bar plot separating a feature by cluster
batch_row_closure

Generate closure function to run batch_snf in an apply-friendly format
check_dfll_item_names

Check if items of a distance metrics list-like object have valid names
check_cfll_fns

Check if items of a clustering functions list-like object are functions
check_dll_uid

Check if UID columns in a nested list have valid structure for a data list
as_ari_matrix

Convert an object to an ARI matrix
check_dll_subitem_classes

Check if UID columns in a nested list have valid structure for a data list
check_dll_duplicate_components

Check if data list contains any duplicate names
check_dfll_subitems_are_fns

Check if items of a distance metrics list-like object are functions
check_hm_dependencies

Check for ComplexHeatmap and circlize dependencies
check_dfll_unique_names

Check if names in a distance metrics list-like object are unique
check_similarity_matrices

Check validity of similarity matrices
dplyr_row_slice.ext_solutions_df

Function to extend dplyr to extended solutions data frame objects
check_sdfl_colnames

Check if settings data frame inherits class data.frame
check_cfll_unique_names

Check if names in a clustering functions list-like object are unique
cache_a_complete_example_ext_sol_df

Cached example extended solutions data frame
check_compatible_sdf_cfl

Check if settings_df exceeds bounds of clust_fns_list
coclustering_coverage_check

Co-clustering coverage check
check_compatible_sdf_dfl

Check if settings_df exceeds bounds of dist_fns_list
collapse_dl

Convert a data list into a data frame
domains

Pull domains from a data list
extend_solutions

Extend a solutions data frame to include outcome evaluations
check_dataless_annotations

Helper function to stop annotation building when no data was provided
check_dll_subitem_names

Check valid item names for a data list-like list
check_compatible_sdf_wm

Check if settings_df and weights_matrix have same number of rows
cache_a_complete_example_lp_ext_sol_df

Cached example extended solutions data frame
cocluster_heatmap

Heatmap of observation co-clustering across resampled data
check_valid_k

Check if max K exceeds the number of observations
cocluster_density

Density plot of co-clustering stability across subsampled data
fav_colour

Mock ABCD "colour" data
data_list

Build a data_list class object
check_dll_four_subitems

Error if data list-like list doesn't have only 4-item nested lists
check_dll_inherits_list

Error if data list-like structure isn't a list
generate_annotations_list

Generate annotations list
gender_df

Mock gender data
dist_fns

Built-in distance functions
esm_manhattan_plot

Manhattan plot of feature-cluster association p-values
check_dll_duplicate_features

Check if data list contains any duplicate features
diagnosis_df

Mock diagnosis data
discretisation

Internal function for estimate_nclust_given_graph
discretisation_evec_data

Internal function for estimate_nclust_given_graph
dplyr_row_slice.solutions_df

Function to extend dplyr to solutions data frame objects
clust_fns

Built-in clustering algorithms
depress

Mock ABCD depression data
check_dll_empty_input

Error if empty input provided during data list initialization
check_sdfl_numeric

Check if settings data frame is numeric
check_sdfl_is_df

Check if settings data frame inherits class data.frame
clust_fns_list

Build a clustering algorithms list
get_pvals

Get p-values from an extended solutions data frame
estimate_nclust_given_graph

Estimate number of clusters for a similarity matrix
cort_sa

Mock ABCD cortical surface area data
get_complete_uids

Pull complete-data UIDs from a list of data frames
get_dl_uids

Extract UIDs from a data list
get_min_pval

Get minimum p-value
get_dist_matrix

Calculate distance matrices
dlapply

Apply-like function for data list objects
colour_scale

Return a colour ramp for a given vector
convert_uids

Convert unique identifiers of data list to "uid"
get_heatmap_order

Return the row or column ordering present in a heatmap
check_valid_sc

Check if SNF config has valid structure
merge.clust_fns_list

Merge clust_fns_list objects
mc_manhattan_plot

Manhattan plot of feature-meta cluster association p-values
income

Mock ABCD income data
chi_squared_pval

Chi-squared test p-value (generic)
features

Return character vector of features stored in an object
gselect

Helper function to pick columns from a data frame by grepl search
merge.data_list

Merge observations between two compatible data lists
dist_fns_list

Build a distance metrics list
linear_adjust

Linearly correct data list by features with unwanted signal
get_cluster_solutions

Extract cluster membership information from a sol_df
linear_model_pval

Linear model p-value (generic)
ensure_dll_df

Ensure the data item of each component is a data.frame class object
fisher_exact_pval

Fisher exact test p-value
cort_t

Mock ABCD cortical thickness data
merge.dist_fns_list

Merge dist_fns_list objects
merge.ext_solutions_df

Merge ext_solutions_df objects
get_clusters

Extract cluster membership vector from one solutions data frame row
dll_uid_first_col

Make the uid UID columns of a data list first
drop_inputs

Execute inclusion
generate_settings_matrix

Build a settings data frame
drop_cols

Helper function to remove columns from a data frame
get_representative_solutions

Extract representative solutions from a matrix of ARIs
dl_variable_summary

Variable-level summary of a data list
metasnf_deprecated

Helper function for deprecated function warnings
generate_clust_algs_list

Generate a clustering algorithms list
ext_solutions_df

Constructor for ext_solutions_df class object
expression_df

Modification of SNFtool mock data frame "Data1"
generate_distance_metrics_list

Generate a list of distance metrics
merge.t_solutions_df

Merge t_solutions_df objects
label_propagate

Label propagate cluster solutions to non-clustered observations
get_matrix_order

Return the hierarchical clustering order of a matrix
label_splits

Convert a vector of partition indices into meta cluster labels
label_meta_clusters

Assign meta cluster labels to rows of a solutions data frame or extended solutions data frame
get_mean_pval

Get mean p-value
gexclude

Helper function to drop columns from a data frame by grepl search
merge.weights_matrix

Merge weights_matrix objects
new_sim_mats_list

Constructor for similarity_matrix_list class object
mock_rep_solutions_df

Mock example of a rep_solutions_df metasnf object
mock_settings_df

Mock example of a settings_df metasnf object
is_data_list

Test if the object is a data list
jitter_plot

Jitter plot separating a feature by cluster
metasnf_warning

Helper function for raising warnings
methylation_df

Modification of SNFtool mock data frame "Data2"
new_snf_config

Constructor for snf_config class object
merge_df_list

Merge list of data frames into a single data frame
merge.sim_mats_list

Merge sim_mats_list objects
merge.snf_config

Merge method for SNF config objects
metasnf-package

metasnf: Meta Clustering with Similarity Network Fusion
get_cluster_df

Extract cluster membership information from one solutions data frame row
merge.solutions_df

Merge solutions_df objects
merge.settings_df

Merge settings_df objects
label_prop

Label propagation
plot.ari_matrix

Heatmap of pairwise adjusted rand indices between solutions
mock_snf_config

Mock example of a snf_config metasnf object
mock_clust_fns_list

Mock example of a clust_fns_list metasnf object
mock_ari_matrix

Mock example of an ari_matrix metasnf object
metasnf_error

Helper function for raising errors
new_ext_solutions_df

Constructor for ext_solutions_df class object
plot.data_list

Plot of feature values in a data list
mock_solutions_df

Mock example of a solutions_df metasnf object
new_ari_matrix

Constructor for ari_matrix class object
rbind.solutions_df

Row-binding of solutions data frame class objects
print.data_list

Print method for class data_list
print.sim_mats_list

Print method for class sim_mats_list
plot.solutions_df

Plot of cluster assignments in a solutions data frame
rbind.ext_solutions_df

Row-binding of solutions data frame class objects
print.snf_config

Print method for class snf_config
print.dist_fns_list

Print method for class dist_fns_list
mock_ext_solutions_df

Mock example of a ext_solutions_df metasnf object
mock_mc_solutions_df

Mock example of a mc_solutions_df metasnf object
prefix_dll_uid

Add "uid_" prefix to all UID values in uid column
siw_euclidean_distance

Squared (including weights) Euclidean distance
merge.t_ext_solutions_df

Merge t_ext_solutions_df objects
new_settings_df

Constructor for settings_df class object
plot.ext_solutions_df

Plot of cluster assignments in an extended solutions data frame
new_solutions_df

Constructor for solutions_df class object
mock_data_list

Mock example of a data_list metasnf object
mock_dist_fns_list

Mock example of a dist_fns_list metasnf object
str.clust_fns_list

Structure of a clust_fns_list object
str.data_list

Structure of a data_list object
new_weights_matrix

Constructor for weights_matrix class object
str.weights_matrix

Structure of a weights_matrix object
plot.snf_config

Heatmap for visualizing an SNF config
metasnf_alert

Helper function for raising alerts
print.t_solutions_df

Print method for class t_solutions_df
new_clust_fns_list

Constructor for clust_fns_list class object
not_shown_message

Helper function for creating what hidden ft/obs/sols message
numcol_to_numeric

Convert columns of a data frame to numeric type (if possible)
n_features

Extract number of features stored in an object
print.weights_matrix

Print method for class weights_matrix
metasnf_defunct

Helper function for defunct function errors
pick_cols

Helper function to pick columns from a data frame
mock_weights_matrix

Mock example of a weights_matrix metasnf object
n_observations

Extract number of observations stored in an object
reorder_dl_uids

Reorder the uids in a data list
snf_config

Define configuration for generating a set of SNF-based cluster solutions
print.ext_solutions_df

Print method for class ext_solutions_df
print.settings_df

Print method for class settings_df
mock_t_solutions_df

Mock example of a t_solutions_df metasnf object
pl

Helper function to pluralize a string
pubertal

Mock ABCD pubertal status data
run_snf

Run SNF
resample

Helper resampling function found in ?sample
pval_heatmap

Heatmap of p-values
save_heatmap

Save a heatmap object to a file
quality_measures

Quality metrics
subsample_pairwise_aris

Calculate pairwise adjusted Rand indices across subsamples of data
shiny_annotator

Launch a shiny app to identify meta cluster boundaries
str.dist_fns_list

Structure of a dist_fns_list object
str.ext_solutions_df

Structure of a ext_solutions_df object
random_removal

Generate random removal sequence
print_with_n_message

Helper function for outputting tip on changing rows printed
subsample_dl

Create subsamples of a data list
split_parser

Helper function to determine which row and columns to split on
sim_mats_list

Create or extract a sim_mats_list class object
str.settings_df

Structure of a settings_df object
new_data_list

Constructor for data_list class object
print.clust_fns_list

Print method for class clust_fns_list
print.ari_matrix

Print method for class ari_matrix
summary.ext_solutions_df

Summary method for class ext_solutions_df
summary.clust_fns_list

Summary method for class clust_fns_list
summary.ari_matrix

Summary method for class ari_matrix
new_dist_fns_list

Constructor for dist_fns_list class object
subc_v

Mock ABCD subcortical volumes data
ord_reg_pval

Ordinal regression p-value
str.ari_matrix

Structure of a ari_matrix object
print.solutions_df

Print method for class solutions_df
str.sim_mats_list

Structure of a sim_mats_list object
str.t_ext_solutions_df

Structure of a t_ext_solutions_df object
rename_dl

Rename features in a data list
print_with_t_message

Helper function for transposing solutions_df message
scale_diagonals

Adjust the diagonals of a matrix
parallel_batch_snf

Parallel processing form of batch_snf
summary.sim_mats_list

Summary method for class sim_mats_list
str.t_solutions_df

Structure of a t_solutions_df object
settings_df

Build a settings data frame
remove_dll_incomplete

Remove observations with incomplete data from a data list-like list object
solutions_df

Constructor for solutions_df class object
sol_df_col_order

Helper function for organizing solutions df-like column order
str.solutions_df

Structure of a solutions_df object
str.snf_config

Structure of a snf_config object
summary.settings_df

Summary method for class settings_df
summarize_dfl

Summarize a distance functions list
uids

Pull UIDs from an object
rbind.t_solutions_df

Row-binding of t_solutions_df class objects
print.t_ext_solutions_df

Print method for class t_ext_solutions_df
rbind.weights_matrix

Row-bind weights matrices
similarity_matrix_heatmap

Plot heatmap of similarity matrix
summary.weights_matrix

Summary method for class weights_matrix
similarity_matrix_path

Generate a complete path and filename to store an similarity matrix
summarize_clust_fns_list

Summarize a clust_fns_list object
validate_ari_matrix

Validator for ari_matrix class object
var_manhattan_plot

Manhattan plot of feature-feature association p-values
summary.t_solutions_df

Summary method for class t_solutions_df
summary.data_list

Summary method for class data_list
validate_snf_config

Validator for snf_config class object
validate_weights_matrix

Validator for weights_matrix class object
validate_solutions_df

Validator for solutions_df class object
summary.dist_fns_list

Summary method for class dist_fns_list
train_test_assign

Training and testing split
validate_clust_fns_list

Validator for clust_fns_list class object
summary_features

Pull features used to calculate summary p-values from an object
summary.solutions_df

Summary method for class solutions_df
snf_step

Helper function for using the correct SNF scheme
summary.t_ext_solutions_df

Summary method for class t_ext_solutions_df
summarize_dl

Summarize a data list
snf_scheme

SNF schemes
summarize_pvals

Summarize p-value columns of an extended solutions data frame
validate_ext_solutions_df

Validator for ext_solutions_df class object
validate_dist_fns_list

Validator for dist_fns_list class object
weights_matrix

Generate a matrix to store feature weights
validate_data_list

Validator for data_list class object
summary.snf_config

Summary method for class snf_config
validate_settings_df

Validator for settings_df class object
validate_sim_mats_list

Validator for similarity_matrix_list class object
abcd_subc_v

Mock ABCD subcortical volumes data
abcd_colour

Mock ABCD "colour" data
abcd_h_income

Mock ABCD income data
abcd_cort_t

Mock ABCD cortical thickness data
abcd_depress

Mock ABCD depression data
abcd_pubertal

Mock ABCD pubertal status data
abcd_anxiety

Mock ABCD anxiety data
abcd_income

Mock ABCD income data
add_columns

Add columns to a data frame
abcd_cort_sa

Mock ABCD cortical surface area data