Function for grouping survival curves based on the k-means or k-medians algorithm. It returns the number of groups and the assignment.
survclustcurves(
time,
status = NULL,
x,
kvector = NULL,
kbin = 50,
method = "LR",
nboot = 100,
algorithm = "kmeans",
alpha = 0.05,
cluster = FALSE,
ncores = NULL,
seed = NULL,
multiple.method = "bonferroni"
)A list containing the following items:
A data frame containing the null hypothesis tested, the values of the test statistic and the obtained pvalues.
Original levels of the variable x.
A vector of integers (from 1:k) indicating the cluster to which each curve is allocated.
An object containing the centroids (mean of the curves pertaining to the same group).
An object containing the fitted curves for each population.
Survival time.
Censoring indicator of the survival time of the process; 0 if the total time is censored and 1 otherwise.
Categorical variable indicating the population to which the observations belongs.
A vector specifying the number of groups of curves to be checking.
Size of the grid over which the survival functions are to be estimated.
Testing procedure used for obtain the number of groups. Default is "LR"
Possible values are one of:
- "LR": Regular Log-Rank test, sensitive to detect late differences.
- "GB": Gehan-Breslow (generalized Wilcoxon), detect early differences.
- "TW": Tarone-Ware, detect early differences.
- "PP": Peto-Peto's modified survival estimate, more robust than
Tharone-Ware or Gehan-Breslow, detect early differences
- "mPP": modified Peto-Peto (by Andersen)
- "FH": Fleming-Harrington (p = 1, q = 1)
- "bootstrap": Villanueva, Sestelo & Meira-Machado bootstrap procedure.
Number of bootstrap repeats.Only for bootstrap method.
A character string specifying which clustering algorithm is used,
i.e., k-means("kmeans") or k-medians ("kmedians").
Significance level of the testing procedure. Defaults to 0.05.
A logical value. If FALSE (default), the
bootstrap testing procedure is parallelized. Note that there are cases
(e.g., a low number of bootstrap repetitions) that R will gain in
performance through serial computation. R takes time to distribute tasks
across the processors also it will need time for binding them all together
later on. Therefore, if the time for distributing and gathering pieces
together is greater than the time need for single-thread computing, it does
not worth parallelize.
An integer value specifying the number of cores to be used
in the parallelized procedure. If NULL (default), the number of cores
to be used is equal to the number of cores of the machine - 1.
Seed to be used in the procedure.
Correction for multiple comparisons. See Details. Not used in the case of the bootstrap method.
Nora M. Villanueva and Marta Sestelo.
The adjustment methods include the Bonferroni correction ("bonferroni") in which the p-values are multiplied by the number of comparisons. Less conservative corrections are also included by Holm (1979) ('holm'), Hochberg (1988) ('hochberg'), Hommel (1988) ('hommel'), Benjamini & Hochberg (1995) ('BH' or its alias 'fdr'), and Benjamini & Yekutieli (2001) ('BY'), respectively.
This correction is not applied in the bootstrap method.
library(clustcurv)
library(survival)
data(veteran)
# Survival framework
res <- survclustcurves(time = veteran$time, status = veteran$status,
x = veteran$celltype, algorithm = 'kmeans', nboot = 2)
Run the code above in your browser using DataLab