Construct a PCLasso model based on a gene/protein expression matrix, survival data, and protein complexes.
PCLasso(
x,
y,
group,
penalty = c("grLasso", "grMCP", "grSCAD"),
standardize = TRUE,
...
)A n x p matrix of gene/protein expression measurements with n samples and p genes/proteins.
The time-to-event outcome, as a two-column matrix or Surv
object. The first column should be time on study (follow up time); the
second column should be a binary variable with 1 indicating that the event
has occurred and 0 indicating (right) censoring.
A list of groups. The feature (gene/protein) names in
group should be consistent with the feature (gene/protein) names in
x.
The penalty to be applied to the model. For group selection,
one of grLasso, grMCP, or grSCAD. See grpsurv in the R package
grpreg for details.
Logical flag for x standardization, prior to
fitting the model. Default is TRUE.
Arguments to be passed to grpsurv in the R package
grpreg.
An object with S3 class \code{PCLasso} containing:
An object of class grpsurv
Complexes with features (genes/proteins) not included
in x being filtered out.
The function PCLasso implements the PCLasso model when the
parameter penalty is set to "grLasso". The PCLasso model is a
prognostic model which selects important predictors at the protein complex
level to achieve accurate prognosis and identify risk protein complexes.
The PCLasso model has three inputs: a gene expression matrix, survival
data, and protein complexes. It estimates the correlation between gene
expression in protein complexes and survival data at the level of protein
complexes. Similar to the traditional Lasso-Cox model, PCLasso is based on
the Cox PH model and estimates the Cox regression coefficients by
maximizing partial likelihood with regularization penalty. The difference
is that PCLasso selects features at the level of protein complexes rather
than individual genes. Considering that genes usually function by forming
protein complexes, PCLasso regards genes belonging to the same protein
complex as a group, and constructs a l1/l2 penalty based on the sum (i.e.,
l1 norm) of the l2 norms of the regression coefficients of the group
members to perform the selection of features at the group level. Since a
gene may belong to multiple protein complexes, that is, there is overlap
between protein complexes, the classical group Lasso-Cox model for
non-overlapping groups may lead to false sparse solutions. The PCLasso
model deals with the overlapping problem of protein complexes by
constructing a latent group Lasso-Cox model. And by reconstructing the gene
expression matrix of the protein complexes, the latent group Lasso-Cox
model is transformed into a non-overlapping group Lasso-Cox model in an
expanded space, which can be directly solved using the classical group
Lasso method. Through the final sparse solution, we can predict the
patient's risk score based on a small set of protein complexes and identify
risk protein complexes that are frequently selected to construct prognostic
models. The penalty parameters grSCAD and grMCP can also be
used to identify survival-related risk protein complexes. Their penalty for
large coefficients is smaller than grLasso, so they tend to choose
less risk protein complexes.
PCLasso: a protein complex-based, group lasso-Cox model for accurate prognosis and risk protein complex discovery. Brief Bioinform, 2021.
Park, H., Niida, A., Miyano, S. and Imoto, S. (2015) Sparse overlapping group lasso for integrative multi-omics analysis. Journal of computational biology: a journal of computational molecular cell biology, 22, 73-84.
# NOT RUN {
# load data
data(survivalData)
data(PCGroups)
x = survivalData$Exp
y = survivalData$survData
PC.Human <- getPCGroups(Groups = PCGroups, Organism = "Human",
Type = "EntrezID")
# fit PCLasso model
fit.PCLasso <- PCLasso(x, y, group = PC.Human, penalty = "grLasso")
# fit PCSCAD model
fit.PCSCAD <- PCLasso(x, y, group = PC.Human, penalty = "grSCAD")
# fit PCMCP model
fit.PCMCP <- PCLasso(x, y, group = PC.Human, penalty = "grMCP")
# }
Run the code above in your browser using DataLab