The main function relies on an optional variable pre-selection procedure that is run before the PRSP algorithm. At this point, this is done by a cross-validated penalization of the partial likelihood using the R package glmnet.
The following describes the end-user functions that are needed to run a complete procedure. The other internal subroutines are not documented in the manual and are not to be called by the end-user at any time. For computational efficiency, some end-user functions offer a parallelization option that is done by passing a few parameters needed to configure a cluster. This is indicated by an asterisk (* = optionally involving cluster usage). The R features are categorized as follows:
PRIMsrc.news
Display the PRIMsrc Package News
Function to display the log file NEWS
of updates of the PRIMsrc package.
summary
Summary Function
S3-generic summary function to summarize the main parameters used to generate the PRSP
object.print
Print Function
S3-generic print function to display the cross-validated estimated values of the PRSP
object.
plot
2D Visualization of Data Scatter and Box Vertices
S3-generic plotting function for two-dimensional visualization of original or predicted data scatter
as well as cross-validated box vertices of a PRSP
object. The scatter plot is for a given
peeling step of the peeling sequence and a given plane, both specified by the user.
predict
Predict Function
S3-generic predict function to predict the box membership and box vertices
on an independent set.
sbh
(*)
Cross-Validated Survival Bump Hunting
Main end-user function for fitting a cross-validated Survival Bump Hunting (SBH) model.
Returns a cross-validated PRSP
object, as generated by our Patient Recursive Survival Peeling or PRSP algorithm,
containing cross-validated estimates of end-points statistics of interest.
The function relies on an internal variable pre-selection procedure before the PRSP algorithm is run.
At this point, this is done by Elastic-Net (EN) penalization of the partial likelihood, where both mixing (alpha
)
and overal shrinkage (lambda
) parameters are simultaneously estimated by cross-validation using the
glmnet::cv.glmnet
function of the R package glmnet. The returned S3-class PRSP
object contains
cross-validated estimates of all the decision-rules of pre-selected covariates and all other statistical quantities of
interest at each iteration of the peeling sequence (inner loop of the PRSP algorithm). This enables the graphical display
of results of profiling curves for model tuning, peeling trajectories, covariate traces and survival distributions
(see plotting functions for more details). The function offers a number of options for the number of cross-validation
replicates to be perfomed: \(B\); the type of cross-validation desired: \(K\)-fold (replicated)-averaged or-combined,
as well as the peeling and optimization critera chosen for model tuning and a few more parameters for the PRSP algorithm.
The function takes advantage of the R package parallel, which allows users to create a cluster of
workstations on a local and/or remote machine(s), enabling scaling-up with the number of specified CPU cores and efficient
parallel execution.
plot_profile
Visualization for Model Selection/Validation
Function for plotting the cross-validated tuning profiles of a PRSP
object.
It uses the user's choice of statistics among the Log Hazard Ratio (LHR), Log-Rank Test (LRT) or Concordance Error Rate (CER)
as a function of the model tuning parameter, that is, the optimal number of peeling steps of the peeling sequence
(inner loop of our PRSP algorithm).
plot_boxtraj
Visualization of Peeling Trajectories/Profiles
Function for plotting the cross-validated peeling trajectories/profiles of a PRSP
object.
Applies to the user-specified covariates among the pre-selected ones and all other statistical quantities of interest
at each iteration of the peeling sequence (inner loop of our PRSP algorithm).
plot_boxtrace
Visualization of Covariates Traces
Function for plotting the cross-validated covariates traces of a PRSP
object.
Plot the cross-validated modal trace curves of covariate importance and covariate usage of the user-specified
covariates among the pre-selected ones at each iteration of the peeling sequence (inner loop of our PRSP algorithm).
plot_boxkm
Visualization of Survival Distributions
Function for plotting the cross-validated survival distributions of a PRSP
object.
Plot the cross-validated Kaplan-Meir estimates of survival distributions for the highest risk (inbox) versus
lower-risk (outbox) groups of samples at each iteration of the peeling sequence (inner loop of our PRSP algorithm).
Synthetic.1
,
Synthetic.1b
,
Synthetic.2
,
Synthetic.3
,
Synthetic.4
Five Datasets From Simulated Regression Survival Models
Five datasets from simulated regression survival models #1-4 as described in Dazard et al. (2015) representing low- and high-dimensional situations,
and where regression parameters represent various types of relationship between survival times and covariates including saturated and noisy situations.
In three datasets where non-informative noisy covariates were used, these covariates were not part of the design matrix (models #2-3 and #4).
In one dataset, the signal is limited to a box-shaped region \(R\) of the predictor space (model #1b).
In the last dataset, the signal is limited to 10% of the predictors in a \(p > n\) situation (model #4).
Survival time was generated from an exponential model with with rate parameter \(\lambda\) (and mean \(\frac{1}{\lambda}\))
according to a Cox-PH model with hazard exp(eta), where eta(.) is the regression function.
Censoring indicator were generated from a uniform distribution on [0,3] (models #1-3) or [0,2] (model #4).
In these synthetic datasets, all covariates are continuous, i.i.d. from a multivariate uniform distribution on [0,1] (models #1-3)
or from a multivariate standard normal distribution (model #4).Real.1
Clinical Dataset
Publicly available HIV clinical data from the Women's Interagency HIV cohort Study (WIHS).
Inclusion criteria of the study were that women at enrolment were (i) alive, (ii) HIV-1 infected, and
(iii) free of clinical AIDS symptoms. Women were followed until the first of the following occurred:
(i) treatment initiation (HAART), (ii) AIDS diagnosis, (iii) death, or administrative censoring.
The studied outcomes were the competing risks "AIDS/Death (before HAART)" and "Treatment Initiation (HAART)".
However, here, for simplification purposes, only the first of the two competing events (i.e. the time to AIDS/Death),
was used in this dataset example. Likewise, the entire study enrolled 1164 women, but only the complete cases were used
in this dataset example for simplification. Variables included history of Injection Drug Use ("IDU") at enrollment,
African American ethnicity ("Race"), age ("Age"), and baseline CD4 count ("CD4"). The question in this dataset example
was whether it is possible to achieve a prognostication of patients for AIDS and HAART.
Real.2
Genomic Dataset
Publicly available lung cancer genomic data from the Chemores Cohort Study. This was an integrated study of mRNA, miRNA
and clinical variables to characterize the molecular distinctions between squamous cell carcinoma (SCC)
and adenocarcinoma (AC) in Non Small Cell Lung Cancer (NSCLC). Tissue samples were analysed from a cohort of 123 patients
who underwent complete surgical resection at the Institut Mutualiste Montsouris (Paris, France) between 30 January 2002 and 26 June 2006.
In this genomic dataset, only the expression levels of Agilent miRNA probes (\(p=939\)) were included from the \(n=123\) samples of the Chemores cohort.
It represents a situation where the number of covariates dominates the number of complete observations, or \(p >> n\) case.
Known Bugs/Problems : None at this time.
makeCluster
(R package parallel)
plot.survfit
(R package survival)
glmnet
(R package glmnet)