Three functions: W3val
, W4val
and W5val
, each of which is needed to compute \(E[T^3]\)
(i.e., for the skewness of \(T\))
where \(T=T(\theta)\) which is defined in Equation (2) of tango:2007;textualnnspat as follows:
Let \((z_1,\ldots,z_n )\), \(n = n_0 + n_1\), denote the locations of the points in the combined sample
when the indices have been randomly permuted so that the \(z_i\) contain no information about group membership.
$$T(\theta)=\sum_{i=1}^{n}\sum_{j=1}^{n}\delta_i \delta_j a_{ij}(\theta)=
\boldsymbol \delta^t \boldsymbol A(\theta)) \boldsymbol \delta$$ where \(\delta_i=1\) if \(z_i\) is a case,
and 0 if \(z_i\) is a control, \(\boldsymbol A(\theta) = (a_{ij} (\theta))\) could be any matrix of a measure of
the closeness between two points \(i\) and \(j\) with \(a_{ii} = 0\) for all \(i = 1,\ldots,n\), and \(\boldsymbol \theta =
(\theta_1,\ldots,\theta_p)^t\) denotes the unknown parameter vector related to cluster size and
\(\boldsymbol \delta = (\delta_1,\ldots,\delta_n)^t\).
Here the number of cases are denoted as \(n_1\) and number of controls as \(n_0\) to match the case-control class
labeling, which is just the reverse of the labeling in tango:2007;textualnnspat.
If \(\theta=k\) in the nearest neighbors model with \(a_{ij}(k) = 1\) if \(z_j\) is among the \(k\)NNs of \(z_i\) and 0
otherwise, then the test statistic \(T(\theta) = T_k\) is the Cuzick and Edwards \(k\)NN test statistic, \(T_k\)
cuzick:1990;textualnnspat, see also ceTk
.
\(W_k\) values are used for Tango's correction to Cuzick and Edwards \(k\)NN test statistic, \(T_k\) and
\(W_k\) here corresponds to \(W_{k-1}\) in tango:2007;textualnnspat
(defined for consistency with \(p_k\)'s and \(alpha_r\) having \(r\) distinct elements).
The argument of the function is the \(A_{ij}\) matrix, a
, which is the output of the function aij.mat
.
However, inside the function we symmetrize the matrix a
as b <- (a+a^t)/2
, to facilitate the formulation.