msPCA
An R Package for Sparse PCA with Multiple Principal Components
Installation
This package can be installed from CRAN directly (pending CRAN registration):
install.packages("msPCA")Alternatively, it can be installed from this Github repository using the devtools package. You would first need to install devtools:
install.packages("devtools")and then run the following commands:
library(devtools)
install_github('jeanpauphilet/msPCA')Getting started
The package consists of one main function, msPCA, which takes as input:
- a data matrix (either the correlation or covariance matrix of the dataset),
- the number of principal components (PCs) to be computed, r,
- a list of r integers corresponding to the sparsity of each PC.
It returns an objecti with 4 fields
x_best(p x r array containing the sparse PCs),objective_valueorthogonality_violationruntime.
Here is a short example demonstrating how to use the package. First, you need to load the library.
library(msPCA)Then, define the input variables.
library(datasets)
df <- datasets::mtcars
TestMat <- cor(df)And then simply call the function
mspca(TestMat, 2, c(4,4))Development
Here, we provide more information about the code structure and organization to help developers that would like to improve the method or build up on it.
Files
- R
- RcppExports.R
It offers the R interface, which will call the corresponding C++ functions. Regenerate or change it manually if needed (e.g., if the interface changes). We recommend generating it automatically by using
Rcpp::compileAttributes(). - main.R It contains all the functions of the package. For the functions coded in Rcpp (and exported in the RcppExports.R file), this script provides (i) user-friendly names, (ii) documentation. This script also defines useful supporting functions.
- RcppExports.R
It offers the R interface, which will call the corresponding C++ functions. Regenerate or change it manually if needed (e.g., if the interface changes). We recommend generating it automatically by using
- man/ contains the pages of the manual: one page for the package and one per function. The are generated automatically from the comments in R/main.R via the
devtools::document()command. - src/ contains the source files of the algorithm, in C++.
- ConstantArguments.h It contains some parameters of the algorithm that are not directly tuneable by the end user.
- msPCA_R_CPP.cpp It contains the implementation of the algorithm.
- RcppExports.cpp
It contains the converted function that can be used by R. Regenerate or change it manually if needed (e.g., if the interface changes). It can be generated using
Rcpp::compileAttributes(). - Makevars This is not currently used. Use it to set attributes, such as the version of C++ for compilation.
- Makevars.win This is not currently used. Use it to set attributes, such as the version of C++ for compilation.
- test/ contains some template R notebooks
- notebook_mtcars.R compares the PCs generated by msPCA on the mtcars dataset with the ones obtained using several alternative packages (elasticnet, PMA, sparsepca)
- notebook_plot.R provides code to represent the resulting PCs on any 2D-plane
- notebook_synthetic.R compares the performance of msPCA and elasticnet on synthetically generated data with 2 true sparse PCs. Results are stored in the 'msPCA_synthetic_results.csv' file and graphically represented.
- NAMESPACE It is used to build this package. Change it if needed (e.g., if the interface changes).
- DESCRIPTION It contains the description of this package.
- LICENSE It contains the license information.
- msPCA.Rproj It contains the settings of this R project. It is used by RStudio and often does not need to be changed.
Guidance to future developers
- The essence of this algorithm is in the file "msPCA_R_CPP.cpp" and the file "ConstantArguments.h", where "msPCA_R_CPP.cpp" handles the computation and "ConstantArguments.h" lists all internal arguments.