logrank_test() provides the weighted logrank test reformulated as a
linear rank test. The family of weighted logrank tests encompasses a large
collection of tests commonly used in the analysis of survival data including,
but not limited to, the standard (unweighted) logrank test, the Gehan-Breslow
test, the Tarone-Ware class of tests, the Peto-Peto test, the Prentice test,
the Prentice-Marek test, the Andersen-Borgan-Gill-Keiding test, the
Fleming-Harrington class of tests, the Gaugler-Kim-Liao class of tests and the
Self class of tests. A general description of these methods is given by
|coin::Klein_Moeschberger_2003|Ch. 7. See
coin::leton_2001 for
the linear rank test formulation.
The null hypothesis of equality, or conditional equality given block,
of the survival distribution of y in the groups defined by x is
tested. In the two-sample case, the two-sided null hypothesis is \(H_0\!:
\theta = 1\), where \(\theta = \lambda_2 / \lambda_1\)
and \(\lambda_s\) is the hazard rate in the \(s\)th sample. In case
alternative = "less", the null hypothesis is \(H_0\!: \theta \ge
1\), i.e., the survival is lower in sample 1 than in sample
2. When alternative = "greater", the null hypothesis is \(H_0\!:
\theta \le 1\), i.e., the survival is higher in sample 1
than in sample 2.
If x is an ordered factor, the default scores, 1:nlevels(x), can
be altered using the scores argument (see
independence_test()); this argument can also be used to coerce
nominal factors to class "ordered". In this case, a linear-by-linear
association test is computed and the direction of the alternative hypothesis
can be specified using the alternative argument. This type of
extension of the standard logrank test was given by
coin::tarone_1975 and later
generalized to general weights by coin::tarone_1977.
Let \((t_i, \delta_i)\), \(i = 1, 2, \ldots, n\), represent a
right-censored random sample of size \(n\), where \(t_i\) is the observed
survival time and \(\delta_i\) is the status indicator (\(\delta_i\) is 0
for right-censored observations and 1 otherwise). To allow for ties in the
data, let \(t_{(1)} < t_{(2)} < \cdots < t_{(m)}\) represent the \(m\), \(m \le n\), ordered distinct event times.
At time \(t_{(k)}\), \(k = 1, 2, \ldots, m\), the number of events
and the number of subjects at risk are given by \(d_k = \sum_{i = 1}^n
I\!\left(t_i = t_{(k)}\,|\, \delta_i = 1\right)\) and \(n_k = n - r_k\), respectively, where
\(r_k\) depends on the ties handling method.
Three different methods of handling ties are available using
ties.method: mid-ranks ("mid-ranks", default), the
Hothorn-Lausen method ("Hothorn-Lausen") and average-scores
("average-scores"). The first and last method are discussed and
contrasted by coin::callaert_2003, whereas the second method is defined in
coin::Hothorn:2003:CSDA. The mid-ranks method leads to
$$
r_k = \sum_{i = 1}^n I\!\left(t_i < t_{(k)}\right)
$$
whereas the Hothorn-Lausen method uses
$$
r_k = \sum_{i = 1}^n I\!\left(t_i \le t_{(k)}\right) - 1.
$$
The scores assigned to right-censored and uncensored observations at the
\(k\)th event time are given by
$$
C_k = \sum_{j = 1}^k w_j \frac{d_j}{n_j}
\quad \mathrm{and} \quad
c_k = C_k - w_k,
$$
respectively, where \(w\) is the logrank weight. For the average-scores
method, used by, e.g., the software package StatXact, the \(d_k\) events
observed at the \(k\)th event time are arbitrarily ordered by assigning them
distinct values \(t_{(k_l)}\), \(l = 1, 2, \ldots, d_k\),
infinitesimally to the left of \(t_{(k)}\). Then scores
\(C_{k_l}\) and \(c_{k_l}\) are computed as indicated above,
effectively assuming that no event times are tied. The scores \(C_k\) and
\(c_k\) are assigned the average of the scores \(C_{k_l}\) and
\(c_{k_l}\), respectively. It then follows that the score for the
\(i\)th subject is
$$
a_i = \left\{
\begin{array}{ll}
C_{k'} & \mathrm{if}~\delta_i = 0 \\
c_{k'} & \mathrm{otherwise}
\end{array}
\right.
$$
where \(k' = \max \{k: t_i \ge t_{(k)}\}\).
The type argument allows for a choice between some of the most
well-known members of the family of weighted logrank tests, each corresponding
to a particular weight function. The standard logrank test ("logrank",
default) was suggested by coin::Mantel:1966,
coin::peto_1972 and coin::cox_1972
and has \(w_k = 1\). The Gehan-Breslow test ("Gehan-Breslow")
proposed by coin::gehan_1965 and later extended to \(K\) samples by
coin::breslow_1970 is a generalization of the Wilcoxon rank-sum test, where \(w_k =
n_k\). The Tarone-Ware class of tests ("Tarone-Ware") discussed by
coin::tarone_1977 has \(w_k = n_k^\rho\), where \(\rho\) is a
constant; \(\rho = 0.5\) (default) was suggested by coin::tarone_1977,
but note that \(\rho = 0\) and \(\rho = 1\) lead to the standard logrank
test and Gehan-Breslow test, respectively. The Peto-Peto test
("Peto-Peto") suggested by coin::peto_1972 is another
generalization of the Wilcoxon rank-sum test, where
$$
w_k = \hat{S}_k = \prod_{j = 0}^{k - 1} \frac{n_j - d_j}{n_j}
$$
is the left-continuous Kaplan-Meier estimator of the survival function,
\(n_0 \equiv n\) and \(d_0 \equiv 0\). The Prentice
test ("Prentice") is also a generalization of the Wilcoxon rank-sum
test proposed by coin::prentice_1978, where
$$
w_k = \prod_{j = 1}^k \frac{n_j}{n_j + d_j}.
$$
The Prentice-Marek test ("Prentice-Marek") is yet another
generalization of the Wilcoxon rank-sum test discussed by
coin::prentice_1979, with
$$
w_k = \tilde{S}_k = \prod_{j = 1}^k \frac{n_j + 1 - d_j}{n_j + 1}.
$$
The Andersen-Borgan-Gill-Keiding test ("Andersen-Borgan-Gill-Keiding")
suggested by coin::andersen_1982 is a modified version of the
Prentice-Marek test using
$$
w_k = \frac{n_k}{n_k + 1} \prod_{j = 0}^{k - 1} \frac{n_j + 1 - d_j}{n_j + 1},
$$
where, again, \(n_0 \equiv n\) and \(d_0 \equiv 0\).
The Fleming-Harrington class of tests ("Fleming-Harrington") proposed
by coin::Fleming+Harrington:1991 uses \(w_k = \hat{S}_k^\rho (1 -
\hat{S}_k)^\gamma\), where \(\rho\)
and \(\gamma\) are constants; \(\rho = 0\) and \(\gamma = 0\) lead to
the standard logrank test, while \(\rho = 1\) and \(\gamma = 0\) result in
the Peto-Peto test. The Gaugler-Kim-Liao class of tests
("Gaugler-Kim-Liao") discussed by coin::gaugler_2007 is a
modified version of the Fleming-Harrington class of tests, replacing
\(\hat{S}_k\) with \(\tilde{S}_k\) so that \(w_k =
\tilde{S}_k^\rho (1 - \tilde{S}_k)^\gamma\), where \(\rho\) and \(\gamma\) are constants; \(\rho
= 0\) and \(\gamma = 0\) lead to the standard logrank test, whereas
\(\rho = 1\) and \(\gamma = 0\) result in the Prentice-Marek test. The
Self class of tests ("Self") suggested by
coin::self_1991 has \(w_k =
v_k^\rho (1 - v_k)^\gamma\), where
$$
v_k = \frac{1}{2} \frac{t_{(k-1)} + t_{(k)}}{t_{(m)}},
\quad
t_{(0)} \equiv 0
$$
is the standardized mid-point between the \((k - 1)\)th and the \(k\)th
event time. (This is a slight generalization of Self's original proposal in
order to allow for non-integer follow-up times.) Again, \(\rho\) and
\(\gamma\) are constants and \(\rho = 0\) and \(\gamma = 0\) lead to
the standard logrank test.
The conditional null distribution of the test statistic is used to obtain
\(p\)-values and an asymptotic approximation of the exact distribution is
used by default (distribution = "asymptotic"). Alternatively, the
distribution can be approximated via Monte Carlo resampling or computed
exactly for univariate two-sample problems by setting distribution to
"approximate" or "exact", respectively. See
asymptotic(), approximate() and
exact() for details.