Learn R Programming

cutpoint (version 1.0.0)

cp_est: Estimate cutpoints in a multivariable setting for survival data

Description

One or two cutpoints of a metric variable are estimated using either the AIC (Akaike Information Criterion) or the LRT (Likelihood-Ratio Test statistic) within a multivariable Cox proportional hazards model. These cutpoints are used to create two or three groups with different survival probabilities.

The cutpoints are estimated by dichotomising the variable of interest, which is then incorporated into the Cox regression model. The cutpoint of this variable is the value at which the AIC reaches its lowest value or the LRT statistic achieves its maximum for the corresponding Cox-regression model.

This process occurs within a multivariable framework, as other covariates and/or factors are considered during the search for the cutpoints. Cutpoints can also be estimated when the variable of interest shows a U-shaped or inverse U-shaped relationship to the hazard ratio of time-to-event data. The argument symtail facilitates the estimation of two cutpoints, ensuring that the two outer tails represent groups of equal size.

Usage

cp_est(
  cpvarname,
  time = "time",
  event = "event",
  covariates = NULL,
  data = data,
  nb_of_cp = 1,
  bandwith = 0.1,
  est_type = "AIC",
  cpvar_strata = FALSE,
  ushape = FALSE,
  symtails = FALSE,
  dp = 2,
  plot_splines = TRUE,
  all_splines = TRUE,
  print_res = TRUE,
  verbose = TRUE
)

Value

Returns the cpobj object with cutpoints and the characteristics of the formed groups.

Arguments

cpvarname

character, the name of the variable for which the cutpoints are estimated.

time

character, this is the follow-up time.

event

character, the status indicator, normally 0=no event, 1=event

covariates

character vector with the names of the covariates and/ or factors. If no covariates are used, set covariates = NULL.

data

a data.frame, contains the following variables:

  • variable which is dichotomized

  • follow-up time

  • event (status indicator)

  • covariates and/or cofactors

nb_of_cp

numeric, number of cutpoints to be estimated (1 or 2). The default is: nb_of_cp = 1. The other option is nb_of_cp = 2.

bandwith

numeric, minimum group size per group in percent of the total sample size, bandwith must be between 0.05 and 0.30, default is 0.1 If ushape = TRUE, bandwidth must be at least 0.1.

est_type

character, the method used to estimate the cutpoints. The default is 'AIC' (Akaike information criterion). The other options is 'LRT' (likelihood ratio test statistic)

cpvar_strata

logical value: if FALSE, The dichotomised variable serves as covariate in the Cox-regression model for cutpoint determination. If TRUE, the dichotomised variable is included as a strata in the Cox-regression model to determine the cutpoint rather than as a covariate. Default is FALSE.

ushape

logical value: if TRUE, the cutpoints are estimated under the assumtion that the spline plot shows a U-shaped form or a inverted U-shaped curve. Default is FALSE.

symtails

logical value: if TRUE, the cutpoints are estimated with symmetric tails. If nb_of_cp = 1, symtails is set to FALSE. Default is FALSE.

dp

numeric, number of decimal places the cutpoints are rounded to. Default is dp = 2.

plot_splines

logical value: if TRUE, a penalized spline plot is created. Default is TRUE.

all_splines

logical value: if TRUE, The plot shows splines with different degrees of freedom. This may help determine whether misspecification or overfitting occurs. Default is TRUE.

print_res

logical value: if TRUE the function prints the summary of the cutpoint estimation to the console. Default is TRUE.

verbose

logical value: if TRUE the function prints the approximate remaining process-time and other information to the console. If FALSE, no information will be printed to the console, including the summary of the cutpoint estimation. Default is TRUE.

References

Govindarajulu, U., & Tarpey, T. (2020). Optimal partitioning for the proportional hazards model. Journal of Applied Statistics, 49(4), 968–987. https://doi.org/10.1080/02664763.2020.1846690

See Also

cp_splines_plot() for penalized spline plots, cp_value_plot() for Value plots and Index plots

Examples

Run this code
# \donttest{
# Example 1:
# Estimate two cutpoints of the variable biomarker.
# The dataset data1 is included in this package and contains
# the variables time, event, biomarker, covariate_1, and covariate_2.
cpobj <- cp_est(
  cpvarname  = "biomarker",
  covariates = c("covariate_1", "covariate_2"),
  data       = data1,
  nb_of_cp   = 2,
  plot_splines = FALSE
  )

# Example 2:
# Searching for cutpoints, if the variable shows a U-shaped or
# inverted U-shaped relationship to the hazard ratio.
# The dataset data2_ushape is included in this package and contains
# the variables time, event, biomarker, and cutpoint_1.
cpobj <- cp_est(
  cpvarname  = "biomarker",
  covariates = c("covariate_1"),
  data       = data2_ushape,
  nb_of_cp   = 2,
  bandwith   = 0.2,
  ushape     = TRUE,
  plot_splines = FALSE
  )
  # }

Run the code above in your browser using DataLab