Learn R Programming

bigPLScox

PLS models for Cox regression with big data in R

Frédéric Bertrand and Myriam Maumy-Bertrand

https://doi.org/10.32614/CRAN.package.bigPLScox

bigPLScox provides Partial Least Squares (PLS) methods for Cox proportional hazards models, with a particular focus on high dimensional and big memory settings. The package supports classical PLS Cox methods together with accelerated C++ backends that operate directly on bigmemory::big.matrix objects.

The main design goals are:

  • Efficient PLS based Cox models for large p and large n
  • First class support for file backed big matrices
  • Unified prediction, cross validation and diagnostic tools

Standalone benchmarking scripts that complement the vignette live under inst/benchmarks/.

The documentation website and examples are maintained by Frédéric Bertrand and Myriam Maumy.

Conference highlight. Maumy, M. and Bertrand, F. (2023).

"PLS models and their extension for big data". Conference presentation at the Joint Statistical Meetings (JSM 2023), Toronto, Ontario, Canada, Aug 5–10, 2023.

Conference highlight. Maumy, M. and Bertrand, F. (2023).

"bigPLS: Fitting and cross-validating PLS-based Cox models to censored big data". Poster at BioC2023: The Bioconductor Annual Conference, Dana-Farber Cancer Institute, Boston, MA, USA, Aug 2–4, 2023. doi:10.7490/f1000research.1119546.1.

Core modelling functions

The following families of PLS Cox estimators are available.

  • coxgpls() and coxgplsDR()
    Generalised PLS Cox regression based on partial likelihood, with an optional deviance residual based variant (coxgplsDR).

  • coxsgpls() and coxsgplsDR()
    Sparse PLS Cox estimators that encourage variable selection at the latent component level.

  • coxspls_sgpls() and coxspls_sgplsDR()
    Structured sparse PLS Cox versions that support group information.

  • DK style estimators
    coxDKgplsDR(), coxDKsgplsDR() and coxDKspls_sgplsDR() implement deviance residual based variants following the DK strategy.

All these functions come in both default and formula interfaces and have matching predict() methods with support for type = "link", "risk" and other standard Cox outputs.

Cross validation helpers are provided through:

  • cv.coxgpls(), cv.coxgplsDR()
  • cv.coxsgpls(), cv.coxsgplsDR()
  • cv.coxspls_sgpls() and cv.coxspls_sgplsDR()
  • cv.coxDKgplsDR(), cv.coxDKsgplsDR(), cv.coxDKspls_sgplsDR()

These mirror the criteria used in plsRcox and include time dependent survival metrics.

Big memory PLS Cox backends

The package offers dedicated functions for Cox PLS fits on large matrices, including file backed bigmemory::big.matrix objects.

  • big_pls_cox()
    Iterative construction of PLS components for Cox models using big matrices, with optional naive sparsity through keepX.

  • big_pls_cox_fast()
    High performance exact PLS Cox backend. It operates on both standard dense matrices and big.matrix inputs and is implemented entirely in C++ for speed.

  • big_pls_cox_gd()
    Gradient based optimisation of the Cox partial likelihood in the latent PLS space. The method argument selects the optimisation scheme:

    • "gd" for a basic fixed step gradient descent
    • "bb" for a Barzilai Borwein step size
    • "nesterov" for Nesterov style acceleration
    • "bfgs" for a quasi Newton type update

    All optimisation methods share the same PLS scores and differ only in how the Cox coefficients are updated.

  • big_pls_cox_transform()
    Low level interface that applies a trained PLS Cox transformation to new data, used internally by the prediction helpers and also exported for advanced workflows.

Cross validation for the big memory backends is provided by:

  • cv.big_pls_cox()
  • cv.big_pls_cox_gd()

These functions help select the number of components and compare the exact and gradient based backends.

Prediction, plots and summaries

The following S3 methods are provided for PLS Cox fits.

  • predict.big_pls_cox()
    Prediction method for the original big memory PLS Cox solver.

  • predict.big_pls_cox_fast()
    Unified prediction interface for exact PLS Cox fits on both dense and big matrices. Supports:

    • type = "link", "risk", "response"
    • type = "components" to return PLS scores
    • comps to select a subset of components
    • coef to supply custom Cox coefficients
  • predict.big_pls_cox_gd()
    Prediction for gradient based fits that supports the same type, comps and coef arguments and uses the stored Cox fit by default.

  • plot.big_pls_cox() and plot.big_pls_cox_gd()
    Simple visual summaries of component effects, often used together with deviance residual plots.

  • summary.big_pls_cox(), summary.big_pls_cox_fast() and summary.big_pls_cox_gd()
    Text summaries that expose the PLS structure, number of components, and the embedded Cox fit.

  • print.big_pls_cox(), print.big_pls_cox_gd() and print.summary.big_pls_cox_fast()
    Compact console output for quick inspection.

Several internal PLS models from plsRcox (for example gPLS, sPLS, sgPLS, pls.cox) also have stats::predict() methods registered in the namespace so that standard predict() calls continue to work.

Diagnostics and model selection

bigPLScox provides a range of tools for residual diagnostics, component selection and inspection of gradient based fits.

  • Deviance residual tools

    • computeDR() carries out deviance residual computation and can use a pure R or C++ engine, with optional support for big matrices.
    • cox_deviance_residuals() and cox_deviance_residuals_big() implement low level deviance residuals for dense and big memory data.
    • cox_partial_deviance_big() and cox_deviance_details() expose partial deviance and internal calculations.
    • benchmark_deviance_residuals() provides a simple wrapper to compare different implementations on synthetic data.
  • Component summaries

    • component_information() extracts per component information such as variance explained and effective variable usage from both big_pls_cox and big_pls_cox_gd fits.
    • select_ncomp() offers information criteria based choices for the number of components, for example AIC or BIC like rules.
  • Gradient based diagnostics

    • gd_diagnostics() returns optimisation diagnostics for gradient based backends, including iteration counts, log likelihood progression, gradient norms and step sizes.

These tools are intended to complement classic survival model diagnostics such as survival::coxph() residual plots.

Utilities, data and scaling

A small number of helper functions and data objects round out the package.

  • bigscale
    Scaling of big matrices that is compatible with the big memory PLS Cox workflow.

  • bigSurvSGD.na.omit() and partialbigSurvSGDv0()
    Interfaces for survival stochastic gradient methods provided by the companion bigSurvSGD package.

  • dataCox
    Example survival dataset used in documentation and unit tests.

The package also re exports the %*% and Arith methods used with some big matrix types.

Vignettes and learning material

Several vignettes ship with the package and are accessible once it is installed.

  • Getting started with bigPLScox
  • Overview of the main modelling functions and their extensions
  • Big memory workflows with bigmemory matrices
  • Benchmarking bigPLScox against baseline Cox implementations

Refer to the pkgdown site for rendered versions of these documents and a complete function reference:

https://fbertran.github.io/bigPLScox/

Installation

You can install the released version of bigPLScox from CRAN with:

install.packages("bigPLScox")

You can install the development version of bigPLScox from GitHub with:

# install.packages("devtools")
devtools::install_github("fbertran/bigPLScox")

Minimal example

The following minimal example uses the micro array data bundled with the package.

library(bigPLScox)
data(micro.censure)
data(Xmicro.censure_compl_imp)

Y <- micro.censure$survyear
status <- micro.censure$DC
X <- Xmicro.censure_compl_imp

set.seed(123)
fit <- coxgpls(
  Xplan = X,
  time = Y,
  status = status,
  ncomp = 4,
  ind.block.x = c(3, 10, 20)
)
#> Error in colMeans(x, na.rm = TRUE): 'x' must be numeric

summary(fit)
#> Error: object 'fit' not found

A big memory workflow uses bigmemory::big.matrix objects.

library(bigmemory)

X_big <- bigmemory::as.big.matrix(X)

fast_fit <- big_pls_cox_fast(
  X = X_big,
  time = Y,
  status = status,
  ncomp = 4
)

lp <- predict(fast_fit, newdata = X_big, type = "link")
head(lp)
#> [1] -0.4296294 -0.7809034  1.6411946 -1.3885315  1.2299486 -1.7144312

For more elaborate examples, including cross validation and comparisons between the exact and gradient based backends, see the vignettes and the scripts under inst/benchmarks.

Citation

If you use bigPLScox in scientific work, please cite the package and the associated conference material.

Maumy, M. and Bertrand, F. (2023). PLS models and their extension for big data. Joint Statistical Meetings, Toronto, Ontario, Canada.

Maumy, M. and Bertrand, F. (2023). bigPLS: Fitting and cross validating PLS based Cox models to censored big data. BioC2023, Dana Farber Cancer Institute, Boston, MA, poster contribution. doi:10.7490/f1000research.1119546.1.

Copy Link

Version

Install

install.packages('bigPLScox')

Monthly Downloads

178

Version

0.8.1

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Frederic Bertrand

Last Published

November 18th, 2025

Functions in bigPLScox (0.8.1)

coxDKspls_sgplsDR

Fitting a Cox-Model on sparse PLSR components using the (Deviance) Residuals
component_information

Information criteria for component selection
cox_deviance_residuals

Cox deviance residuals via C++ backends
coxgpls

Fitting a Cox-Model on group PLSR components
computeDR

Compute deviance residuals
coxDKgplsDR

Fitting a Direct Kernel group PLS model on the (Deviance) Residuals
coxgplsDR

Fitting a Cox-Model on group PLSR components using the (Deviance) Residuals
coxsgpls

Fitting a Cox-Model on group sparse PLSR components
coxsgplsDR

Fitting a Cox-Model on group sparse PLSR components using the (Deviance) Residuals
coxDKsgplsDR

Fitting a Direct Kernel group sparse PLS model on the (Deviance) Residuals
cv.coxsgplsDR

Cross-validating a Cox-Model fitted on sparse group PLSR components using (Deviance) Residuals
cv.big_pls_cox

Cross-validation for big-memory PLS-Cox models
coxspls_sgplsDR

Fitting a Cox-Model on sparse PLSR components using the (Deviance) Residuals
cv.coxDKsgplsDR

Cross-validating a Direct Kernel group sparse PLS model fitted on the (Deviance) Residuals
cv.coxgpls

Cross-validating a Cox-Model fitted on group PLSR components
coxspls_sgpls

Fitting a Cox-Model on sparse PLSR components
cv.coxDKspls_sgplsDR

Cross-validating a Direct Kernel sparse PLS model fitted on the (Deviance) Residuals
cv.coxgplsDR

Cross-validating a Cox-Model fitted on group PLSR components using (Deviance) Residuals
cv.coxsgpls

Cross-validating a Cox-Model fitted on sparse group PLSR components
cv.coxDKgplsDR

Cross-validating a Direct Kernel group PLS model fitted on the (Deviance) Residuals
plot.big_pls_cox_gd

Plot method for big_pls_cox_gd objects
cv.coxspls_sgplsDR

Cross-validating a Cox-Model fitted on sparse PLSR components components using (Deviance) Residuals
partialbigSurvSGDv0

Incremental Survival Model Fitting with Pre-Scaled Data
dataCox

Cox Proportional Hazards Model Data Generation From Weibull Distribution
gd_diagnostics

Extract Diagnostics from a big_pls_cox_gd Model
plot.big_pls_cox

Plot method for big_pls_cox objects
dCox_sim

Simulated survival dataset for Cox models
micro.censure

Microsat features and survival times
internal-bigPLScox

Internal bigPLScox functions
predict.big_pls_cox

Predict method for big-memory PLS-Cox models
predict.big_pls_cox_gd

Predict method for big_pls_cox_gd
predict_pls_latent

Predict responses and latent scores from PLS fits
predict.big_pls_cox_fast

Predictions for fast big PLS–Cox fits
summary.big_pls_cox

Summary for big_pls_cox objects
cv.coxspls_sgpls

Cross-validating a Cox-Model fitted on sparse PLSR components
predict_cox_pls

Predict survival summaries from legacy Cox-PLS fits
sim_data

Simulated dataset
print.big_pls_cox_gd

Print method for big_pls_cox_gd objects
print.summary.big_pls_cox_fast

Print method for summary.big_pls_cox_fast objects
print.big_pls_cox

Print method for big_pls_cox objects
summary.big_pls_cox_fast

Summary for big_pls_cox objects
summary.big_pls_cox_gd

Summary for big_pls_cox_gd objects
Xmicro.censure_compl_imp

Imputed Microsat features
big_pls_cox

Partial Least Squares Components for Cox Models with Big Matrices
big_pls_cox_fast

Partial Least Squares Components for Cox Models (fast backend)
align_big_plscox

Align a GD fit to a PLS fit (optional refit)
bigPLScox-package

bigPLScox-package
bigSurvSGD.na.omit

Fit Survival Models with Stochastic Gradient Descent
bigscale

Construct Scaled Design Matrices for Big Survival Models
big_pls_cox_transform

Transform new data to PLS–Cox scores
big_pls_cox_gd

Gradient based PLS Cox for big matrices
bigmatrix-operations

Matrix and arithmetic operations for big.matrix objects