Learn R Programming

sharp: Stability-enHanced Approaches using Resampling Procedures

Description

In stability selection and consensus clustering, resampling techniques are used to enhance the reliability of the results. In this package, hyper-parameters are calibrated by maximising model stability, which is measured under the null hypothesis that all selection (or co-membership) probabilities are identical. Functions are readily implemented for the use of LASSO regression, sparse PCA, sparse (group) PLS or graphical LASSO in stability selection, and hierarchical clustering, partitioning around medoids, K means or Gaussian mixture models in consensus clustering.

Installation

The released version of the package can be installed from CRAN with:

install.packages("sharp")

The development version can be installed from GitHub:

remotes::install_github("barbarabodinier/sharp")

Example datasets

To illustrate the use of the main functions implemented in sharp, three artificial datasets are created:

library(sharp)

# Dataset for regression
set.seed(1)
data_reg <- SimulateRegression(n = 200, pk = 10)
x_reg <- data_reg$xdata
y_reg <- data_reg$ydata

# Dataset for structural equation modelling
set.seed(1)
data_sem <- SimulateStructural(n = 200, pk = c(5, 2, 3))
x_sem <- data_sem$data

# Dataset for graphical modelling
set.seed(1)
data_ggm <- SimulateGraphical(n = 200, pk = 20)
x_ggm <- data_ggm$data

# Dataset for clustering
set.seed(1)
data_clust <- SimulateClustering(n = c(10, 10, 10))
x_clust <- data_clust$data

Check out the R package fake for more details on these data simulation models.

Main functions

Variable selection

In a regression context, stability selection is done using LASSO regression as implemented in the R package glmnet.

stab_reg <- VariableSelection(xdata = x_reg, ydata = y_reg)
SelectedVariables(stab_reg)

Structural equation modelling

In a structural equation modelling context, stability selection is done using series of LASSO regressions as implemented in the R package glmnet.

dag <- LayeredDAG(layers = c(5, 2, 3))
stab_sem <- StructuralEquations(xdata = x_sem, adjacency = dag)
LinearSystemMatrix(vect = Stable(stab_sem), adjacency = dag)

Graphical modelling

In a graphical modelling context, stability selection is done using the graphical LASSO as implemented in the R package glassoFast.

stab_ggm <- GraphicalModel(xdata = x_ggm)
Adjacency(stab_ggm)

Clustering

Consensus clustering is done using hierarchical clustering as implemented in the R package stats.

stab_clust <- Clustering(xdata = x_clust)
Clusters(stab_clust)

Extraction and visualisation of the results

It is strongly recommended to check the calibration of the hyper-parameters using the function CalibrationPlot() on the output from any of the main functions listed above. The functions print(), summary() and plot() can also be used on the outputs from the main functions.

Parametrisation

Stability selection and consensus clustering can theoretically be done by aggregating the results from any selection (or clustering) algorithm on subsamples of the data. The choice of the underlying algorithm to use is specified in argument implementation in the main functions. Consensus clustering using partitioning around medoids, K means or Gaussian mixture models are also supported in sharp:

stab_clust <- Clustering(xdata = x_clust, implementation = PAMClustering)
stab_clust <- Clustering(xdata = x_clust, implementation = KMeansClustering)
stab_clust <- Clustering(xdata = x_clust, implementation = GMMClustering)

Other algorithms can be used by defining a wrapper function to be called in implementation. Check out the documentation of GraphicalModel() for an example using a shrunk estimate of the partial correlation instead of the graphical LASSO.

References

  • Barbara Bodinier, Sabrina Rodrigues, Maryam Karimi, Sarah Filippi, Julien Chiquet and Marc Chadeau-Hyam. Stability Selection and Consensus Clustering in R: The R Package sharp. (2025) Journal of Statistical Software. link

  • Barbara Bodinier, Dragana Vuckovic, Sabrina Rodrigues, Sarah Filippi, Julien Chiquet and Marc Chadeau-Hyam. Automated calibration of consensus weighted distance-based clustering approaches using sharp. (2023) Bioinformatics. link

  • Barbara Bodinier, Sarah Filippi, Therese Haugdahl Nost, Julien Chiquet and Marc Chadeau-Hyam. Automated calibration for stability selection in penalised regression and graphical models. (2021) Journal of the Royal Statistical Society: Series C (Applied Statistics). link

  • Nicolai Meinshausen and Peter Bühlmann. Stability selection. (2010) Journal of the Royal Statistical Society: Series B (Statistical Methodology). link

  • Stefano Monti, Pablo Tamayo, Jill Mesirov and Todd Golub. Consensus clustering. (2003) Machine Learning. link

Copy Link

Version

Install

install.packages('sharp')

Monthly Downloads

281

Version

1.4.8

License

GPL (>= 3)

Issues

Pull Requests

Stars

Forks

Maintainer

Barbara Bodinier

Last Published

July 23rd, 2025

Functions in sharp (1.4.8)

ClusteringPerformance

Clustering performance
Clustering

Consensus clustering
Coefficients

Model coefficients
CoMembership

Pairwise co-membership
ClusteringAlgo

(Weighted) clustering algorithm
Combine

Merging stability selection outputs
CheckInputClustering

Checking input parameters (clustering)
CheckPackageInstalled

Checking that a package is installed
CheckParamRegression

Checking input parameters (regression model)
CheckInputGraphical

Checking input parameters (graphical model)
FDP

False Discovery Proportion
Folds

Splitting observations into folds
DBSCANClustering

(Weighted) density-based clustering
GMMClustering

Model-based clustering
Ensemble

Ensemble model
ConsensusScore

Consensus score
ExplanatoryPerformance

Prediction performance in regression
EnsemblePredictions

Predictions from ensemble model
DummyToCategories

Categorical from dummy variables
Concatenate

Concatenate stability objects
Incremental

Incremental prediction performance in regression
GroupPLS

Group Partial Least Squares
LambdaGridGraphical

Grid of penalty parameters (graphical model)
HierarchicalClustering

(Weighted) hierarchical clustering
GraphicalAlgo

Graphical model algorithm
KMeansClustering

(Sparse) K-means clustering
GraphicalModel

Stability selection graphical model
LambdaGridRegression

Grid of penalty parameters (regression model)
GraphComparison

Edge-wise comparison of two graphs
Graph

Graph visualisation
OpenMxMatrix

Matrix from OpenMx outputs
NAToNULL

Transforms NA into NULL
OpenMxModel

Writing OpenMx model (matrix specification)
PAMClustering

(Weighted) Partitioning Around Medoids
PLS

Partial Least Squares 'a la carte'
PFER

Per Family Error Rate
PenalisedGraphical

Graphical LASSO
PenalisedOpenMx

Penalised Structural Equation Model
PenalisedRegression

Penalised regression
SelectionPerformance

Selection performance
SelectionPerformanceGraph

Graph representation of selection performance
PredictPLS

Partial Least Squares predictions
SelectionAlgo

Variable selection algorithm
SelectionPerformanceSingle

Selection performance (internal)
LambdaSequence

Sequence of penalty parameters
LinearSystemMatrix

Matrix from linear system outputs
SelectionProportions

Selection/co-membership proportions
SelectionProportionsGraphical

Selection proportions (graphical model)
Resample

Resampling observations
Refit

Regression model refitting
SelectionProportionsRegression

Selection proportions (variable selection)
SparseGroupPLS

Sparse group Partial Least Squares
SparseKMeansClustering

Sparse K means clustering
SerialGraphical

Stability selection graphical model (internal)
SerialRegression

Stability selection in regression (internal)
SerialClustering

Consensus clustering (internal)
SparsePCA

Sparse Principal Component Analysis
Square

Adjacency from bipartite
Split

Splitting observations into non-overlapping sets
SparsePLS

Sparse Partial Least Squares
StabilityScore

Stability score
StabilityMetrics

Stability selection metrics
mystar

Star-shaped nodes
StructuralModel

Stability selection in Structural Equation Modelling
WeightBoxplot

Stable attribute weights
Stable

Stable results
VariableSelection

Stability selection in regression
plot.clustering

Consensus matrix heatmap
mytriangle

Triangular nodes
UnweightedKMeansClustering

Unweighted K-means clustering
sharp-package

sharp: Stability-enHanced Approaches using Resampling Procedures
plot.variable_selection

Plot of selection proportions
plot.incremental

Plot of incremental performance
predict.variable_selection

Predict method for stability selection
plot.roc_band

Receiver Operating Characteristic (ROC) band
CART

Classification And Regression Trees
BiSelection

Stability selection of predictors and/or outcomes
CalibrationCurve

Calibration curve (internal)
AggregatedEffects

Summarised coefficients conditionally on selection
CalibrationPlot

Calibration plot
ArgmaxId

Calibrated hyper-parameter(s)
BlockLambdaGrid

Multi-block grid
BinomialProbabilities

Binomial probabilities for stability score
AdjacencyFromObject

Adjacency matrix from object
CheckDataRegression

Checking input data (regression model)