KFOCI

Variable selection with KPC using directed K-NN graph or minimum spanning tree (MST)

Implementations of two empirical versions the kernel partial correlation (KPC) coefficient
and the associated variable selection algorithms. KPC is a measure of the strength of conditional
association between Y and Z given X, with X, Y, Z being random variables taking values in
general topological spaces. As the name suggests, KPC is defined in terms of kernels on
reproducing kernel Hilbert spaces (RKHSs). The population KPC is a deterministic number
between 0 and 1; it is 0 if and only if Y is conditionally independent of Z given X, and it is 1 if
and only if Y is a measurable function of Z and X. One empirical KPC estimator is based on
geometric graphs, such as K-nearest neighbor graphs and minimum spanning trees, and is
consistent under very weak conditions. The other empirical estimator, defined using conditional
mean embeddings (CMEs) as used in the RKHS literature, is also consistent under suitable
conditions. Using KPC, a stepwise forward variable selection algorithm KFOCI (using the graph
based estimator of KPC) is provided, as well as a similar stepwise forward selection algorithm
based on the RKHS based estimator. For more details on KPC, its empirical estimators and its
application on variable selection, see Huang, Z., N. Deb, and B. Sen (2022). “Kernel partial
correlation coefficient – a measure of conditional dependence” (URL listed below). When X is
empty, KPC measures the unconditional dependence between Y and Z, which has been described
in Deb, N., P. Ghosal, and B. Sen (2020), “Measuring association on topological spaces using
kernels and geometric graphs” <arXiv:2010.01768>, and it is implemented in the functions
KMAc() and Klin() in this package. The latter can be computed in near linear time.

Zhen Huang

Kernel Partial Correlation Coefficient

Nabarun Deb

Bodhisattva Sen

KFOCI function

<dl><dt>Y</dt>
<dd>a matrix of responses (n by dy)</dd>
<dt>X</dt>
<dd>a matrix of predictors (n by dx)</dd>
<dt>k</dt>
<dd>a function \(k(y, y')\) of class <code>kernel</code>. It can be the kernel implemented in <code>kernlab</code> e.g., Gaussian kernel: <code>rbfdot(sigma = 1)</code>, linear kernel: <code>vanilladot()</code>.</dd>
<dt>Knn</dt>
<dd>a positive integer indicating the number of nearest neighbor; or "MST". The suggested choice of Knn is 0.05n for samples up to a few hundred observations. For large n, the suggested Knn is sublinear in n. That is, it may grow slower than any linear function of n. The computing time is approximately linear in Knn. A smaller Knn takes less time.</dd>
<dt>num_features</dt>
<dd>the number of variables to be selected, cannot be larger than dx. The default value is NULL and in that
case it will be set equal to dx. If <code>stop == TRUE</code> (see below), then num_features is the maximal number of variables to be selected.</dd>
<dt>stop</dt>
<dd>If <code>stop == TRUE</code>, then the automatic stopping criterion (stops at the first instance of negative Tn, as mentioned in the paper) will be implemented and continued till <code>num_features</code> many variables are selected. If <code>stop == FALSE</code> then exactly <code>num_features</code> many variables are selected.</dd>
<dt>numCores</dt>
<dd>number of cores that are going to be used for parallelizing the process.</dd>
<dt>verbose</dt>
<dd>whether to print each selected variables during the forward stepwise algorithm</dd></dl>

Arguments

Kernel Feature Ordering by Conditional Independence — KFOCI

<dl>

<dt>Y</dt>
<dd>a matrix of responses (n by dy)</dd>


<dt>X</dt>
<dd>a matrix of predictors (n by dx)</dd>


<dt>k</dt>
<dd>a function \(k(y, y')\) of class <code>kernel</code>. It can be the kernel implemented in <code>kernlab</code> e.g., Gaussian kernel: <code>rbfdot(sigma = 1)</code>, linear kernel: <code>vanilladot()</code>.</dd>


<dt>Knn</dt>
<dd>a positive integer indicating the number of nearest neighbor; or "MST". The suggested choice of Knn is 0.05n for samples up to a few hundred observations. For large n, the suggested Knn is sublinear in n. That is, it may grow slower than any linear function of n. The computing time is approximately linear in Knn. A smaller Knn takes less time.</dd>


<dt>num_features</dt>
<dd>the number of variables to be selected, cannot be larger than dx. The default value is NULL and in that
case it will be set equal to dx. If <code>stop == TRUE</code> (see below), then num_features is the maximal number of variables to be selected.</dd>


<dt>stop</dt>
<dd>If <code>stop == TRUE</code>, then the automatic stopping criterion (stops at the first instance of negative Tn, as mentioned in the paper) will be implemented and continued till <code>num_features</code> many variables are selected. If <code>stop == FALSE</code> then exactly <code>num_features</code> many variables are selected.</dd>


<dt>numCores</dt>
<dd>number of cores that are going to be used for parallelizing the process.</dd>


<dt>verbose</dt>
<dd>whether to print each selected variables during the forward stepwise algorithm</dd>

</dl>

KFOCI: Kernel Feature Ordering by Conditional Independence

Description

Usage

Value

Arguments

Details

See Also

Examples