Learn R Programming

csmpv (version 1.0.5)

LASSO_plus: LASSO_plus Variable Selection and Modeling

Description

This function performs variable selection using the LASSO_plus algorithm and builds a model afterward.

Usage

LASSO_plus(
  data = NULL,
  standardization = FALSE,
  columnWise = TRUE,
  biomks = NULL,
  outcomeType = c("binary", "continuous", "time-to-event"),
  Y = NULL,
  time = NULL,
  event = NULL,
  topN = 10,
  outfile = "nameWithPath",
  height = 6
)

Value

A list is returned:

fit

A model with selected variables for the given outcome variable

outplot

A forest plot

Arguments

data

A data matrix or a data frame, samples are in rows, and features/traits are in columns.

standardization

A logic variable to indicate if standardization is needed before variable selection, the default is FALSE.

columnWise

A logic variable to indicate if column wise or row wise normalization is needed, the default is TRUE, which is to do column-wise normalization. This is only meaningful when "standardization" is TRUE.

biomks

A vector of potential biomarkers for variable selection, they should be a subset of "data" column names.

outcomeType

Outcome variable type. There are three choices: "binary" (default), "continuous", and "time-to-event".

Y

Outcome variable name when the outcome type is either "binary" or "continuous".

time

Time variable name when outcome type is "time-to-event".

event

Event variable name when outcome type is "time-to-event".

topN

An integer indicating the desired number of variables to be selected.

outfile

A string representing the output file, including the path if necessary, but without the file type extension

height

An integer to indicate the forest plot height in inches

Author

Aixiang Jiang

Details

The LASSO_plus algorithm combines LASSO, single variable regression, and stepwise regression to select variables associated with an outcome variable in a given dataset. The outcome variable can be binary, continuous, or time-to-event. After variable selection, a model is built using common R functions such as lm, glm, and coxph, depending on the outcome type.

References

Friedman, J., Hastie, T. and Tibshirani, R. (2008) Regularization Paths for Generalized Linear Models via Coordinate Descent (2010), Journal of Statistical Software, Vol. 33(1), 1-22, doi:10.18637/jss.v033.i01.

Simon, N., Friedman, J., Hastie, T. and Tibshirani, R. (2011) Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent, Journal of Statistical Software, Vol. 39(5), 1-13, doi:10.18637/jss.v039.i05.

Hastie, T. J. and Pregibon, D. (1992) Generalized linear models. Chapter 6 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

Therneau, T., Grambsch, P., Modeling Survival Data: Extending the Cox Model. Springer-Verlag, 2000.

Kassambara A, Kosinski M, Biecek P (2021). survminer: Drawing Survival Curves using 'ggplot2', R package version 0.4.9, <https://CRAN.R-project.org/package=survminer>.

Aoki T, Jiang A, Xu A et al.,(2023) Spatially Resolved Tumor Microenvironment Predicts Treatment Outcomes in Relapsed/Refractory Hodgkin Lymphoma. J Clin Oncol. 2023 Dec 19:JCO2301115. doi: 10.1200/JCO.23.01115. Epub ahead of print. PMID: 38113419.

Examples

Run this code
# Load in data sets:
data("datlist", package = "csmpv")
tdat = datlist$training

# The function saves files locally. You can define your own temporary directory. 
# If not, tempdir() can be used to get the system's temporary directory.
temp_dir = tempdir()
# As an example, let's define Xvars, which will be used later:
Xvars = c("highIPI", "B.Symptoms", "MYC.IHC", "BCL2.IHC", "CD10.IHC", "BCL6.IHC")
# The function can work with three different outcome types. 
# Here, we use binary as an example:
bfit = LASSO_plus(data = tdat, biomks = Xvars, Y = "DZsig", topN = 5,
                  outfile = paste0(temp_dir, "/binaryLASSO_plus"))
# You might save the files to the directory you want.

# To delete the "temp_dir", use the following:
unlink(temp_dir)

Run the code above in your browser using DataLab