survminer (version 0.4.6)

surv_cutpoint: Determine the Optimal Cutpoint for Continuous Variables

Description

Determine the optimal cutpoint for one or multiple continuous variables at once, using the maximally selected rank statistics from the 'maxstat' R package. This is an outcome-oriented methods providing a value of a cutpoint that correspond to the most significant relation with outcome (here, survival).

  • surv_cutpoint(): Determine the optimal cutpoint for each variable using 'maxstat'.

  • surv_categorize(): Divide each variable values based on the cutpoint returned by surv_cutpoint().

Usage

surv_cutpoint(data, time = "time", event = "event", variables,
  minprop = 0.1, progressbar = TRUE)

surv_categorize(x, variables = NULL, labels = c("low", "high"))

# S3 method for surv_cutpoint summary(object, ...)

# S3 method for surv_cutpoint print(x, ...)

# S3 method for surv_cutpoint plot(x, variables = NULL, ggtheme = theme_classic(), bins = 30, ...)

# S3 method for plot_surv_cutpoint print(x, ..., newpage = TRUE)

Arguments

data

a data frame containing survival information (time, event) and continuous variables (e.g.: gene expression data).

time, event

column names containing time and event data, respectively. Event values sould be 0 or 1.

variables

a character vector containing the names of variables of interest, for wich we want to estimate the optimal cutpoint.

minprop

the minimal proportion of observations per group.

progressbar

logical value. If TRUE, show progress bar. Progressbar is shown only, when the number of variables > 5.

x, object

an object of class surv_cutpoint

labels

labels for the levels of the resulting category.

...

other arguments. For plots, see ?ggpubr::ggpar

ggtheme

function, ggplot2 theme name. Default value is theme_classic. Allowed values include ggplot2 official themes. see ?ggplot2::ggtheme.

bins

Number of bins for histogram. Defaults to 30.

newpage

open a new page. See grid.arrange.

Value

  • surv_cutpoint(): returns an object of class 'surv_cutpoint', which is a list with the following components:

    • maxstat results for each variable (see ?maxstat::maxstat)

    • cutpoint: a data frame containing the optimal cutpoint of each variable. Rows are variable names and columns are c("cutpoint", "statistic").

    • data: a data frame containing the survival data and the original data for the specified variables.

    • minprop: the minimal proportion of observations per group.

    • not_numeric: contains data for non-numeric variables, in the context where the user provided categorical variable names in the argument variables.

    Methods defined for surv_cutpoint object are summary, print and plot.

  • surv_categorize(): returns an object of class 'surv_categorize', which is a data frame containing the survival data and the categorized variables.

Examples

Run this code
# NOT RUN {
# 0. Load some data
data(myeloma)
head(myeloma)

# 1. Determine the optimal cutpoint of variables
res.cut <- surv_cutpoint(myeloma, time = "time", event = "event",
   variables = c("DEPDC1", "WHSC1", "CRIM1"))

summary(res.cut)

# 2. Plot cutpoint for DEPDC1
# palette = "npg" (nature publishing group), see ?ggpubr::ggpar
plot(res.cut, "DEPDC1", palette = "npg")

# 3. Categorize variables
res.cat <- surv_categorize(res.cut)
head(res.cat)

# 4. Fit survival curves and visualize
library("survival")
fit <- survfit(Surv(time, event) ~DEPDC1, data = res.cat)
ggsurvplot(fit, data = res.cat, risk.table = TRUE, conf.int = TRUE)

# }

Run the code above in your browser using DataCamp Workspace