Supervised Principal Components
Does prediction in the case of a censored survival outcome, or a regression outcome, using the "supervised principal component" approach. 'Superpc' is especially useful for high-dimensional data when the number of features p dominates the number of samples n (p >> n paradigm), as generated, for instance, by high-throughput technologies.
Supervised Principal Components for regression and survival analysis.
Does prediction in the case of a censored survival outcome, or a regression outcome, using the "supervised principal component" approach (Bair et al., 2006). Superpc is especially useful for high-dimensional data when the number of features p dominates the number of samples n (p >> n paradigm), as generated, for instance, by high-throughput technologies.
Supervised principal components is a generalization of principal components regression. The first (or first few) principal components are the linear combinations of the features that capture the directions of largest variation in a dataset. But these directions may or may not be related to an outcome variable of interest. To find linear combinations that are related to an outcome variable, we compute univariate scores for each gene and then retain only those features whose score exceeds a threshold. A principal components analysis is carried out using only the data from these selected features. Finally, these "supervised principal components" are used in a regression model to predict the outcome. To summarize, the steps are:
- Compute (univariate) standard regression coefficients for each feature
- Form a reduced data matrix consisting of only those features whose univariate coefficient exceeds a threshold theta in absolute value (theta is estimated by cross-validation)
- Compute the first (or first few) principal components of the reduced data matrix
- Use these principal component(s) in a regression model to predict the outcome
This idea can be used in standard regression problems with a quantitative outcome, and also in generalized regression problems such as survival analysis. In the latter problem, the regression coefficients in step (1) are obtained from a proportional hazards model. The superpc R package handles these two cases: standard regression and survival data.
There is one more important point: the features (e.g genes) which important in the prediction are not necessarily the ones that passed the screen in step 2. There are other features that may have as high a correlation with the supervised PC predictor. So we compute an importance score for each feature equal to its correlation with the supervised PC predictor. A reduced predictor is formed by soft-thresholding the importance scores, and using these shrunken scores as weights. The soft-thresholding sets the weight of some features to zero, hence throwing them out of the model. The amount of shrinkage is determined by cross-validation. The reduced predictor often performs as well or better than than the supervised PC predictor, and is more interpretable.
This branch (master) is the default one, that hosts the current development release (version 1.12).
Package superpc is open source / free software, licensed under the GNU General Public License version 3 (GPLv3), sponsored by the Free Software Foundation. To view a copy of this license, visit GNU Free Documentation License.
CRAN downloads since initial release to CRAN (2004-09-16): as tracked by RStudio CRAN mirror
superpc (>= 1.12) requires R-3.5.0 (2018-04-23). It was built and tested under R version 4.0.3 (2020-10-10) and Travis CI.
Installation has been tested on Windows, Linux, OSX and Solaris platforms.
- To install the stable version of
superpc, simply download and install the current version (1.12) from the CRAN repository:
- Alternatively, you can install the most up-to-date development version (>= 1.12) of
superpcfrom the GitHub repository:
install.packages("devtools") library("devtools") devtools::install_github("jedazard/superpc")
- To load the superpc library in an R session and start using it:
- Check details of new features, changes, and bug fixes with the following R command:
- Check on how to cite the package with the R command:
Website - Wiki
- See Rob Tibshirani's Website for more details, and a tutorial with examples and interpretation.
- Eric Bair, Ph.D. firstname.lastname@example.org
- Trevor Hastie, Ph.D. email@example.com
- Debashis Paul, Ph.D. firstname.lastname@example.org
- Robert Tibshirani, Ph.D. email@example.com
- Jean-Eudes Dazard, Ph.D. firstname.lastname@example.org
- This work made use of the High Performance Computing Resource in the Core Facility for Advanced Research Computing at Case Western Reserve University.
- Eric Bair was supported by an NSF graduate research fellowship. Robert Tibshirani was partially supported by National Science Foundation Grant DMS-9971405 and National Institutes of Health Contract N01-HV-28183. Hastie was supported in part by National Science Foundation grant DMS-02-04612 and National Institutes of Health grant R01 CA 72028-07.
Functions in superpc
|superpc.train||Prediction by supervised principal components|
|superpc.predictionplot||Plot outcome predictions from superpc|
|superpc.decorrelate||Decorrelate features with respect to competing predictors|
|superpc.rainbowplot||Make rainbow plot of superpc and compeiting predictors|
|superpc.predict.red.cv||Cross-validation of feature selection for supervised principal components|
|superpc.predict.red||Feature selection for supervised principal components|
|superpc.plot.lrtest||Plot likelhiood ratio test statistics|
|superpc.plotred.lrtest||Plot likelihood ratio test statistics from supervised principal components predictor|
|superpc.cv||Cross-validation for supervised principal components|
|superpc.news||Display the superpc Package News|
|superpc.plotcv||Plot output from superpc.cv|
|superpc.listfeatures||Return a list of the important predictors|
|superpc.predict||Form principal components predictor from a trained superpc object|
|superpc.fit.to.outcome||Fit predictive model using outcome of supervised principal components|
|superpc.lrtest.curv||Compute values of likelihood ratio test from supervised principal components fit|
Last month downloads
|Date/Publication||2020-10-19 22:10:03 UTC|
|License||GPL (>= 3) | file LICENSE|
|Packaged||2020-10-19 18:31:57 UTC; JE D|
Include our badge in your README