# SurvivalTests

##### Two- and \(K\)-Sample Tests for Censored Data

Testing the equality of the survival distributions in two or more independent groups.

##### Usage

```
# S3 method for formula
logrank_test(formula, data, subset = NULL, weights = NULL, …)
# S3 method for IndependenceProblem
logrank_test(object, ties.method = c("mid-ranks", "Hothorn-Lausen",
"average-scores"),
type = c("logrank", "Gehan-Breslow", "Tarone-Ware", "Prentice",
"Prentice-Marek", "Andersen-Borgan-Gill-Keiding",
"Fleming-Harrington", "Gaugler-Kim-Liao", "Self"),
rho = NULL, gamma = NULL, …)
```

##### Arguments

- formula
a formula of the form

`y ~ x | block`

where`y`

is a survival object (see`Surv`

in package survival),`x`

is a factor and`block`

is an optional factor for stratification.- data
an optional data frame containing the variables in the model formula.

- subset
an optional vector specifying a subset of observations to be used. Defaults to

`NULL`

.- weights
an optional formula of the form

`~ w`

defining integer valued case weights for each observation. Defaults to`NULL`

, implying equal weight for all observations.- object
- ties.method
a character, the method used to handle ties: the score generating function either uses mid-ranks (

`"mid-ranks"`

, default), the Hothorn-Lausen method (`"Hothorn-Lausen"`

) or averages the scores of randomly broken ties (`"average-scores"`

); see ‘Details’.- type
a character, the type of test: either

`"logrank"`

(default),`"Gehan-Breslow"`

,`"Tarone-Ware"`

,`"Prentice"`

,`"Prentice-Marek"`

,`"Andersen-Borgan-Gill-Keiding"`

,`"Fleming-Harrington"`

,`"Gaugler-Kim-Liao"`

or`"Self"`

; see ‘Details’.- rho
a numeric, the \(\rho\) constant when

`type`

is`"Tarone-Ware"`

,`"Fleming-Harrington"`

,`"Gaugler-Kim-Liao"`

or`"Self"`

; see ‘Details’. Defaults to`NULL`

, implying`0.5`

for`type = "Tarone-Ware"`

and`0`

otherwise.- gamma
a numeric, the \(\gamma\) constant when

`type`

is`"Fleming-Harrington"`

,`"Gaugler-Kim-Liao"`

or`"Self"`

; see ‘Details’. Defaults to`NULL`

, implying`0`

.- …
further arguments to be passed to

`independence_test`

.

##### Details

`logrank_test`

provides the weighted logrank test reformulated as a
linear rank test. The family of weighted logrank tests encompasses a large
collection of tests commonly used in the analysis of survival data including,
but not limited to, the standard (unweighted) logrank test, the Gehan-Breslow
test, the Tarone-Ware class of tests, the Prentice test, the Prentice-Marek
test, the Andersen-Borgan-Gill-Keiding test, the Fleming-Harrington class of
tests and the Self class of tests. A general description of these methods is
given by Klein and Moeschberger (2003, Ch. 7). See Let<U+00F3>n and
Zuluaga (2001) for the linear rank test formulation.

The null hypothesis of equality, or conditional equality given `block`

,
of the survival distribution of `y`

in the groups defined by `x`

is
tested. In the two-sample case, the two-sided null hypothesis is \(H_0\!:
\theta = 1\), where \(\theta = \lambda_2 / \lambda_1\)
and \(\lambda_s\) is the hazard rate in the \(s\)th sample. In case
`alternative = "less"`

, the null hypothesis is \(H_0\!: \theta \ge
1\), i.e., the survival is lower in sample 1 than in sample
2. When `alternative = "greater"`

, the null hypothesis is \(H_0\!:
\theta \le 1\), i.e., the survival is higher in sample 1
than in sample 2.

If `x`

is an ordered factor, the default scores, `1:nlevels(x)`

, can
be altered using the `scores`

argument (see
`independence_test`

); this argument can also be used to coerce
nominal factors to class `"ordered"`

. In this case, a linear-by-linear
association test is computed and the direction of the alternative hypothesis
can be specified using the `alternative`

argument. This type of
extension of the standard logrank test was given by Tarone (1975) and later
generalized to general weights by Tarone and Ware (1977).

Let \((t_i, \delta_i)\), \(i = 1, 2, \ldots, n\), represent a right-censored random sample of size \(n\), where \(t_i\) is the observed survival time and \(\delta_i\) is the status indicator (\(\delta_i\) is 0 for right-censored observations and 1 otherwise). To allow for ties in the data, let \(t_{(1)} < t_{(2)} < \cdots < t_{(m)}\) represent the \(m\), \(m \le n\), ordered distinct event times. At time \(t_{(k)}\), \(k = 1, 2, \ldots, m\), the number of events and the number of subjects at risk are given by \(d_k = \sum_{i = 1}^n I\!\left(t_i = t_{(k)}\,|\, \delta_i = 1\right)\) and \(n_k = n - r_k\), respectively, where \(r_k\) depends on the ties handling method.

Three different methods of handling ties are available using
`ties.method`

: mid-ranks (`"mid-ranks"`

, default), the
Hothorn-Lausen method (`"Hothorn-Lausen"`

) and average-scores
(`"average-scores"`

). The first and last method are discussed and
contrasted by Callaert (2003), whereas the second method is defined in Hothorn
and Lausen (2003). The mid-ranks method leads to
$$
r_k = \sum_{i = 1}^n I\!\left(t_i < t_{(k)}\right)
$$
whereas the Hothorn-Lausen method uses
$$
r_k = \sum_{i = 1}^n I\!\left(t_i \le t_{(k)}\right) - 1.
$$
The scores assigned to right-censored and uncensored observations at the
\(k\)th event time are given by
$$
C_k = \sum_{j = 1}^k w_j \frac{d_j}{n_j}
\quad \mbox{and} \quad
c_k = C_k - w_k,
$$
respectively, where \(w\) is the logrank weight. For the average-scores
method, used by, e.g., the software package StatXact, the \(d_k\) events
observed at the \(k\)th event time are arbitrarily ordered by assigning them
distinct values \(t_{(k_l)}\), \(l = 1, 2, \ldots, d_k\),
infinitesimally to the left of \(t_{(k)}\). Then scores
\(C_{k_l}\) and \(c_{k_l}\) are computed as indicated above,
effectively assuming that no event times are tied. The scores \(C_k\) and
\(c_k\) are assigned the average of the scores \(C_{k_l}\) and
\(c_{k_l}\) respectively. It then follows that the score for the
\(i\)th subject is
$$
a_i = \left\{
\begin{array}{ll}
C_{k'} & \mbox{if } \delta_i = 0 \\
c_{k'} & \mbox{otherwise}
\end{array}
\right.
$$
where \(k' = \max \{k: t_i \ge t_{(k)}\}\).

The `type`

argument allows for a choice between some of the most
well-known members of the family of weighted logrank tests, each corresponding
to a particular weight function. The standard logrank test (`"logrank"`

,
default) was suggested by Mantel (1966), Peto and Peto (1972) and Cox (1972)
and has \(w_k = 1\). The Gehan-Breslow test (`"Gehan-Breslow"`

)
proposed by Gehan (1965) and later extended to \(K\) samples by Breslow
(1970) is a generalization of the Wilcoxon rank-sum test, where \(w_k =
n_k\). The Tarone-Ware class of tests (`"Tarone-Ware"`

) discussed by
Tarone and Ware (1977) has \(w_k = n_k^\rho\), where \(\rho\) is a
constant; \(\rho = 0.5\) (default) was suggested by Tarone and Ware (1977),
but note that \(\rho = 0\) and \(\rho = 1\) lead to the the standard
logrank test and Gehan-Breslow test respectively. The Prentice test
(`"Prentice"`

) is another generalization of the Wilcoxon rank-sum test
proposed by Prentice (1978), where
$$
w_k = \prod_{j = 1}^k \frac{n_j}{n_j + d_j}.
$$
The Prentice-Marek test (`"Prentice-Marek"`

) is yet another
generalization of the Wilcoxon rank-sum test discussed by Prentice and Marek
(1979), with
$$
w_k = \tilde{S}_k = \prod_{j = 1}^k \frac{n_j + 1 - d_j}{n_j + 1}.
$$
The Andersen-Borgan-Gill-Keiding test (`"Andersen-Borgan-Gill-Keiding"`

)
suggested by Andersen *et al.* (1982) is a modified version of the
Prentice-Marek test using
$$
w_k = \frac{n_k}{n_k + 1}
\prod_{j = 0}^{k - 1} \frac{n_j + 1 - d_j}{n_j + 1}
$$
where \(n_0 \equiv n\) and \(d_0 \equiv 0\). The
Fleming-Harrington class of tests (`"Fleming-Harrington"`

) proposed by
Fleming and Harrington (1991) uses \(w_k = \hat{S}_k^\rho (1 -
\hat{S}_k)^\gamma\), where \(\rho\)
and \(\gamma\) are constants and
$$
\hat{S}_k = \prod_{j = 0}^{k - 1} \frac{n_j - d_j}{n_j},
\quad
\hat{S}_0 \equiv 1
$$
is the *left-continuous* Kaplan-Meier estimator of the survival function;
\(\rho = 0\) and \(\gamma = 0\) lead to the standard logrank test. The
Gaugler-Kim-Liao class of tests (`"Gaugler-Kim-Liao"`

) discussed by
Gaugler *et al.* (2007) is a modified version of the Fleming-Harrington
class of tests, replacing \(\hat{S}_k\) with
\(\tilde{S}_k\) so that \(w_k = \tilde{S}_k^\rho (1 -
\tilde{S}_k)^\gamma\), where
\(\rho\) and \(\gamma\) are constants; \(\rho = 0\) and \(\gamma = 0\)
lead to the standard logrank test. The Self class of tests (`"Self"`

)
suggested by Self (1991) has \(w_k = v_k^\rho (1 - v_k)^\gamma\), where
$$
v_k = \frac{1}{2} \frac{t_{(k-1)} + t_{(k)}}{t_{(m)}},
\quad
t_{(0)} \equiv 0
$$
is the standardized mid-point between the \((k - 1)\)th and the \(k\)th
event time. (This is a slight generalization of Self's original proposal in
order to allow for non-integer follow-up times.) Again, \(\rho\) and
\(\gamma\) are constants and \(\rho = 0\) and \(\gamma = 0\) lead to
the standard logrank test.

The conditional null distribution of the test statistic is used to obtain
\(p\)-values and an asymptotic approximation of the exact distribution is
used by default (`distribution = "asymptotic"`

). Alternatively, the
distribution can be approximated via Monte Carlo resampling or computed
exactly for univariate two-sample problems by setting `distribution`

to
`"approximate"`

or `"exact"`

respectively. See
`asymptotic`

, `approximate`

and `exact`

for details.

##### Value

##### Note

Peto and Peto (1972) proposed the test statistic implemented in
`logrank_test`

and named it the *logrank test*. However, the
Mantel-Cox test (Mantel, 1966; Cox, 1972), as implemented in
`survdiff`

(in package survival), is also known as
the logrank test. These tests are similar, but differ in the choice of
probability model: the (Peto-Peto) logrank test uses the permutational
variance, whereas the Mantel-Cox test is based on the hypergeometric variance.

Combining `independence_test`

or `symmetry_test`

with
`logrank_trafo`

offers more flexibility than `logrank_test`

and allows for, among other things, maximum-type versatile test procedures
(e.g., Lee, 1996; see ‘Examples’) and user-supplied logrank weights
(see `GTSG`

for tests against Weibull-type or crossing-curve
alternatives).

Starting with version 1.1-0, `logrank_test`

replaced `surv_test`

which was made **defunct** in version 1.2-0. Furthermore,
`logrank_trafo`

is now an increasing function for all choices of
`ties.method`

, implying that the test statistic has the same sign
irrespective of the ties handling method. Consequently, the sign of the test
statistic will now be the opposite of what it was in earlier versions unless
`ties.method = "average-scores"`

. (In versions of coin prior to
1.1-0, `logrank_trafo`

was a decreasing function when `ties.method`

was other than `"average-scores"`

.)

Starting with version 1.2-0, mid-ranks and the Hothorn-Lausen method can no
longer be specified with `ties.method = "logrank"`

and
`ties-method = "HL"`

respectively.

##### References

Andersen, P. K., Borgan, <U+00D8>., Gill, R. and Keiding, N. (1982).
Linear nonparametric tests for comparison of counting processes, with
applications to censored survival data (with discussion). *International
Statistical Review* **50**(3), 219--258. 10.2307/1402489

Breslow, N. (1970). A generalized Kruskal-Wallis test for comparing \(K\)
samples subject to unequal patterns of censorship. *Biometrika*
**57**(3), 579--594. 10.1093/biomet/57.3.579

Callaert, H. (2003). Comparing statistical software packages: The case of
the logrank test in StatXact. *The American Statistician* **57**(3),
214--217. 10.1198/0003130031900

Cox, D. R. (1972). Regression models and life-tables (with discussion).
*Journal of the Royal Statistical Society* B **34**(2), 187--220.

Fleming, T. R. and Harrington, D. P. (1991). *Counting Processes and
Survival Analysis*. New York: John Wiley & Sons.

Gaugler, T., Kim, D. and Liao, S. (2007). Comparing two survival time
distributions: An investigation of several weight functions for the weighted
logrank statistic. *Communications in Statistics -- Simulation and
Computation* **36**(2), 423--435. 10.1080/03610910601161272

Gehan, E. A. (1965). A generalized Wilcoxon test for comparing arbitrarily
single-censored samples. *Biometrika* **52**(1--2), 203--223.
10.1093/biomet/52.1-2.203

Hothorn, T. and Lausen, B. (2003). On the exact distribution of maximally
selected rank statistics. *Computational Statistics & Data Analysis*
**43**(2), 121--137. 10.1016/S0167-9473(02)00225-6

Klein, J. P. and Moeschberger, M. L. (2003). *Survival Analysis:
Techniques for Censored and Truncated Data*, Second Edition. New York:
Springer.

Lee, J. W. (1996). Some versatile tests based on the simultaneous use of
weighted log-rank statistics. *Biometrics* **52**(2), 721--725.
10.2307/2532911

Let<U+00F3>n, E. and Zuluaga, P. (2001). Equivalence between score
and weighted tests for survival curves. *Communications in Statistics --
Theory and Methods* **30**(4), 591--608. 10.1081/STA-100002138

Mantel, N. (1966). Evaluation of survival data and two new rank order
statistics arising in its consideration. *Cancer Chemotherapy Reports*
**50**(3), 163--170.

Peto, R. and Peto, J. (1972). Asymptotic efficient rank invariant test
procedures (with discussion). *Journal of the Royal Statistical Society*
A **135**(2), 185--207. 10.2307/2344317

Prentice, R. L. (1978). Linear rank tests with right censored data.
*Biometrika* **65**(1), 167--179. 10.1093/biomet/65.1.167

Prentice, R. L. and Marek, P. (1979). A qualitative discrepancy between
censored data rank tests. *Biometrics* **35**(4), 861--867.
10.2307/2530120

Self, S. G. (1991). An adaptive weighted log-rank test with application to
cancer prevention and screening trials. *Biometrics* **47**(3),
975--986. 10.2307/2532653

Tarone, R. E. (1975). Tests for trend in life table analysis.
*Biometrika* **62**(3), 679--682. 10.1093/biomet/62.3.679

Tarone, R. E. and Ware, J. (1977). On distribution-free tests for equality
of survival distributions. *Biometrika* **64**(1), 156--160.
10.1093/biomet/64.1.156

##### Examples

```
# NOT RUN {
## Example data (Callaert, 2003, Tab. 1)
callaert <- data.frame(
time = c(1, 1, 5, 6, 6, 6, 6, 2, 2, 2, 3, 4, 4, 5, 5),
group = factor(rep(0:1, c(7, 8)))
)
## Logrank scores using mid-ranks (Callaert, 2003, Tab. 2)
with(callaert,
logrank_trafo(Surv(time)))
## Asymptotic Mantel-Cox test (p = 0.0523)
survdiff(Surv(time) ~ group, data = callaert)
## Exact logrank test using mid-ranks (p = 0.0505)
logrank_test(Surv(time) ~ group, data = callaert, distribution = "exact")
## Exact logrank test using average-scores (p = 0.0468)
logrank_test(Surv(time) ~ group, data = callaert, distribution = "exact",
ties.method = "average-scores")
## Lung cancer data (StatXact 9 manual, p. 213, Tab. 7.19)
lungcancer <- data.frame(
time = c(257, 476, 355, 1779, 355,
191, 563, 242, 285, 16, 16, 16, 257, 16),
event = c(0, 0, 1, 1, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1),
group = factor(rep(1:2, c(5, 9)),
labels = c("newdrug", "control"))
)
## Logrank scores using average-scores (StatXact 9 manual, p. 214)
with(lungcancer,
logrank_trafo(Surv(time, event), ties.method = "average-scores"))
## Exact logrank test using average-scores (StatXact 9 manual, p. 215)
logrank_test(Surv(time, event) ~ group, data = lungcancer,
distribution = "exact", ties.method = "average-scores")
## Exact Prentice test using average-scores (StatXact 9 manual, p. 222)
logrank_test(Surv(time, event) ~ group, data = lungcancer,
distribution = "exact", ties.method = "average-scores",
type = "Prentice")
## Approximative (Monte Carlo) versatile test (Lee, 1996)
rho.gamma <- expand.grid(rho = seq(0, 2, 1), gamma = seq(0, 2, 1))
lee_trafo <- function(y)
logrank_trafo(y, ties.method = "average-scores",
type = "Fleming-Harrington",
rho = rho.gamma["rho"], gamma = rho.gamma["gamma"])
it <- independence_test(Surv(time, event) ~ group, data = lungcancer,
distribution = approximate(nresample = 10000),
ytrafo = function(data)
trafo(data, surv_trafo = lee_trafo))
pvalue(it, method = "step-down")
# }
```

*Documentation reproduced from package coin, version 1.3-1, License: GPL-2*