Calculate \(\hat{\eta}_n\) (the unconditional version of graph-based KPC) using directed K-NN graph or minimum spanning tree (MST).
KMAc(
Y,
X,
k = kernlab::rbfdot(1/(2 * stats::median(stats::dist(Y))^2)),
Knn = 1
)
The algorithm returns a real number `KMAc', the empirical kernel measure of association
a matrix of response (n by dy)
a matrix of predictors (n by dx)
a function \(k(y, y')\) of class kernel
. It can be the kernel implemented in kernlab
e.g., Gaussian kernel: rbfdot(sigma = 1)
, linear kernel: vanilladot()
the number of K-nearest neighbor to use; or "MST". A small Knn (e.g., Knn=1) is recommended for an accurate estimate of the population KMAc.
\(\hat{\eta}_n\) is an estimate of the population kernel measure of association, based on data \(\{(X_i,Y_i)\}_{i=1}^n\) from \(\mu\).
For K-NN graph, ties will be broken at random. MST is found using package emstreeR
.
In particular,
$$\hat{\eta}_n:=\frac{n^{-1}\sum_{i=1}^n d_i^{-1}\sum_{j:(i,j)\in\mathcal{E}(G_n)} k(Y_i,Y_j)-(n(n-1))^{-1}\sum_{i\neq j}k(Y_i,Y_j)}{n^{-1}\sum_{i=1}^n k(Y_i,Y_i)-(n(n-1))^{-1}\sum_{i\neq j}k(Y_i,Y_j)},$$
where \(G_n\) denotes a MST or K-NN graph on \(X_1,\ldots , X_n\), \(\mathcal{E}(G_n)\) denotes the set of edges of \(G_n\) and
\((i,j)\in\mathcal{E}(G_n)\) implies that there is an edge from \(X_i\) to \(X_j\) in \(G_n\).
Euclidean distance is used for computing the K-NN graph and the MST.
Deb, N., P. Ghosal, and B. Sen (2020), “Measuring association on topological spaces using kernels and geometric graphs” <arXiv:2010.01768>.
KPCgraph
, Klin
library(kernlab)
KMAc(Y = rnorm(100), X = rnorm(100), k = rbfdot(1), Knn = 1)
Run the code above in your browser using DataLab