Environment Based Clustering for Interpretable Predictive Models in High Dimensional Data
Companion package to the paper: An analytic approach for
interpretable predictive models in high dimensional data, in the presence of
interactions with exposures. Bhatnagar, Yang, Khundrakpam, Evans, Blanchette, Bouchard, Greenwood (2017) <DOI:10.1101/102475>.
This package includes an algorithm for clustering high dimensional data that can be affected by an environmental factor.
This package is under active development
eclust package implements the methods developped in the paper An analytic approach for interpretable predictive models in high dimensional data, in the presence of interactions with exposures (2017+) Preprint. Breifly,
eclust is a two-step procedure: 1a) a clustering stage where variables are clustered based on some measure of similarity, 1b) a dimension reduction stage where a summary measure is created for each of the clusters, and 2) a simultaneous variable selection and regression stage on the summarized cluster measures.
You can install the development version of
eclust from GitHub with:
See the online vignette for example usage of the functions.
This package is makes use of several existing packages including:
glmnetfor lasso and elasticnet regression
earthfor MARS models
WGCNAfor topological overlap matrices
- Park, M. Y., Hastie, T., & Tibshirani, R. (2007). Averaged gene expressions for regression. Biostatistics, 8(2), 212-227.
- Bühlmann, P., Rütimann, P., van de Geer, S., & Zhang, C. H. (2013). Correlated variables in regression: clustering and sparse estimation. Journal of Statistical Planning and Inference, 143(11), 1835-1858.
- Issues: https://github.com/sahirbhatnagar/eclust/issues
- Pull Requests: https://github.com/sahirbhatnagar/eclust/
- e-mail: firstname.lastname@example.org
You can see the most recent changes to the package in the NEWS.md file
Code of Conduct
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
Functions in eclust
|plot.similarity||Function to generate heatmap|
|s_mars_separate||Fit Multivariate Adaptive Regression Splines on Simulated Data|
|s_pen_clust||Fit Penalized Regression Models on Simulated Cluster Summaries|
|s_response||Generate True Response vector for Linear Simulation|
|s_response_mars||Generate True Response vector for Non-Linear Simulation|
|s_modules||Simulate Covariates With Exposure Dependent Correlations|
|plot.eclust||Plot Heatmap of Cluster Summaries by Exposure Status|
|s_mars_clust||Fit MARS Models on Simulated Cluster Summaries|
|simdata||Simulated Data with Environment Dependent Correlations|
|s_pen_separate||Fit Penalized Regression Models on Simulated Data|
|u_extract_selected_earth||Get selected terms from an earth object|
|u_fisherZ||Calculate Fisher's Z Transformation for Correlations|
Vignettes of eclust
Last month downloads
|License||MIT + file LICENSE|
|Packaged||2017-01-25 09:20:51 UTC; sahir|
Include our badge in your README