eclust v0.1.0


Monthly downloads



Environment Based Clustering for Interpretable Predictive Models in High Dimensional Data

Companion package to the paper: An analytic approach for interpretable predictive models in high dimensional data, in the presence of interactions with exposures. Bhatnagar, Yang, Khundrakpam, Evans, Blanchette, Bouchard, Greenwood (2017) <DOI:10.1101/102475>. This package includes an algorithm for clustering high dimensional data that can be affected by an environmental factor.


Travis-CI Build Status

!-- [![CRAN_Status_Badge](]( ![]( ![]( --

This package is under active development


The eclust package implements the methods developped in the paper An analytic approach for interpretable predictive models in high dimensional data, in the presence of interactions with exposures (2017+) Preprint. Breifly, eclust is a two-step procedure: 1a) a clustering stage where variables are clustered based on some measure of similarity, 1b) a dimension reduction stage where a summary measure is created for each of the clusters, and 2) a simultaneous variable selection and regression stage on the summarized cluster measures.


You can install the development version of eclust from GitHub with:



See the online vignette for example usage of the functions.


This package is makes use of several existing packages including:

  • glmnet for lasso and elasticnet regression
  • earth for MARS models
  • WGCNA for topological overlap matrices
  1. Park, M. Y., Hastie, T., & Tibshirani, R. (2007). Averaged gene expressions for regression. Biostatistics, 8(2), 212-227.
  2. Bühlmann, P., Rütimann, P., van de Geer, S., & Zhang, C. H. (2013). Correlated variables in regression: clustering and sparse estimation. Journal of Statistical Planning and Inference, 143(11), 1835-1858.


Latest news

You can see the most recent changes to the package in the file

Code of Conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Functions in eclust

Name Description
plot.similarity Function to generate heatmap
s_mars_separate Fit Multivariate Adaptive Regression Splines on Simulated Data
s_pen_clust Fit Penalized Regression Models on Simulated Cluster Summaries
s_response Generate True Response vector for Linear Simulation
s_response_mars Generate True Response vector for Non-Linear Simulation
s_modules Simulate Covariates With Exposure Dependent Correlations
plot.eclust Plot Heatmap of Cluster Summaries by Exposure Status
s_mars_clust Fit MARS Models on Simulated Cluster Summaries
simdata Simulated Data with Environment Dependent Correlations
s_pen_separate Fit Penalized Regression Models on Simulated Data
u_extract_selected_earth Get selected terms from an earth object
u_fisherZ Calculate Fisher's Z Transformation for Correlations
No Results!

Vignettes of eclust

No Results!

Last month downloads


Type Package
License MIT + file LICENSE
Encoding UTF-8
LazyData true
VignetteBuilder knitr
RoxygenNote 5.0.1
NeedsCompilation no
Packaged 2017-01-25 09:20:51 UTC; sahir
Repository CRAN
Date/Publication 2017-01-26 12:08:12

Include our badge in your README