# superpc v1.12

0

0th

Percentile

## Supervised Principal Components

Does prediction in the case of a censored survival outcome, or a regression outcome, using the "supervised principal component" approach. 'Superpc' is especially useful for high-dimensional data when the number of features p dominates the number of samples n (p >> n paradigm), as generated, for instance, by high-throughput technologies.

# Superpc

Supervised Principal Components for regression and survival analysis.

===============

### Description

Does prediction in the case of a censored survival outcome, or a regression outcome, using the "supervised principal component" approach (Bair et al., 2006). Superpc is especially useful for high-dimensional data when the number of features p dominates the number of samples n (p >> n paradigm), as generated, for instance, by high-throughput technologies.

===============

### Details

Supervised principal components is a generalization of principal components regression. The first (or first few) principal components are the linear combinations of the features that capture the directions of largest variation in a dataset. But these directions may or may not be related to an outcome variable of interest. To find linear combinations that are related to an outcome variable, we compute univariate scores for each gene and then retain only those features whose score exceeds a threshold. A principal components analysis is carried out using only the data from these selected features. Finally, these "supervised principal components" are used in a regression model to predict the outcome. To summarize, the steps are:

• Compute (univariate) standard regression coefficients for each feature
• Form a reduced data matrix consisting of only those features whose univariate coefficient exceeds a threshold theta in absolute value (theta is estimated by cross-validation)
• Compute the first (or first few) principal components of the reduced data matrix
• Use these principal component(s) in a regression model to predict the outcome

This idea can be used in standard regression problems with a quantitative outcome, and also in generalized regression problems such as survival analysis. In the latter problem, the regression coefficients in step (1) are obtained from a proportional hazards model. The superpc R package handles these two cases: standard regression and survival data.

There is one more important point: the features (e.g genes) which important in the prediction are not necessarily the ones that passed the screen in step 2. There are other features that may have as high a correlation with the supervised PC predictor. So we compute an importance score for each feature equal to its correlation with the supervised PC predictor. A reduced predictor is formed by soft-thresholding the importance scores, and using these shrunken scores as weights. The soft-thresholding sets the weight of some features to zero, hence throwing them out of the model. The amount of shrinkage is determined by cross-validation. The reduced predictor often performs as well or better than than the supervised PC predictor, and is more interpretable.

============

### Branches

This branch (master) is the default one, that hosts the current development release (version 1.12).

===========

Package superpc is open source / free software, licensed under the GNU General Public License version 3 (GPLv3), sponsored by the Free Software Foundation. To view a copy of this license, visit GNU Free Documentation License.

=============

CRAN downloads since initial release to CRAN (2004-09-16): as tracked by RStudio CRAN mirror

================

### Requirements

superpc (>= 1.12) requires R-3.5.0 (2018-04-23). It was built and tested under R version 4.0.3 (2020-10-10) and Travis CI.

Installation has been tested on Windows, Linux, OSX and Solaris platforms.

See Travis CI build result:

================

### Installation

• To install the stable version of superpc, simply download and install the current version (1.12) from the CRAN repository:
install.packages("superpc")

• Alternatively, you can install the most up-to-date development version (>= 1.12) of superpc from the GitHub repository:
install.packages("devtools")
library("devtools")
devtools::install_github("jedazard/superpc")


=========

### Usage

• To load the superpc library in an R session and start using it:
library("superpc")

• Check details of new features, changes, and bug fixes with the following R command:
superpc.news()

• Check on how to cite the package with the R command:
citation("superpc")


etc...

==================

### Website - Wiki

• See Rob Tibshirani's Website for more details, and a tutorial with examples and interpretation.

===================

### Acknowledgments

Authors:

Maintainers:

Funding/Provision/Help:

• This work made use of the High Performance Computing Resource in the Core Facility for Advanced Research Computing at Case Western Reserve University.
• Eric Bair was supported by an NSF graduate research fellowship. Robert Tibshirani was partially supported by National Science Foundation Grant DMS-9971405 and National Institutes of Health Contract N01-HV-28183. Hastie was supported in part by National Science Foundation grant DMS-02-04612 and National Institutes of Health grant R01 CA 72028-07.

==============

## Functions in superpc

 Name Description superpc.train Prediction by supervised principal components superpc.predictionplot Plot outcome predictions from superpc superpc.decorrelate Decorrelate features with respect to competing predictors superpc.rainbowplot Make rainbow plot of superpc and compeiting predictors superpc.predict.red.cv Cross-validation of feature selection for supervised principal components superpc.predict.red Feature selection for supervised principal components superpc.plot.lrtest Plot likelhiood ratio test statistics superpc.plotred.lrtest Plot likelihood ratio test statistics from supervised principal components predictor superpc.cv Cross-validation for supervised principal components superpc.news Display the superpc Package News superpc.plotcv Plot output from superpc.cv superpc.listfeatures Return a list of the important predictors superpc.predict Form principal components predictor from a trained superpc object superpc.fit.to.outcome Fit predictive model using outcome of supervised principal components superpc.lrtest.curv Compute values of likelihood ratio test from supervised principal components fit No Results!