kldist: Kullback-Leibler divergence between density of within and between batch pairwise distances
Description
This metric estimates the Kullback-Leibler divergences between the distributions
of the within and that of the between batch euclidian distances of pairs of
observations. These distributions should be similar in the abscence of stronger
batch effects.
Usage
kldist(xba, batch)
Arguments
xba
matrix. The covariate matrix, raw or after batch effect adjustment. observations in rows, variables in columns.
batch
factor. Batch variable. Currently has to have levels: '1', '2', '3' and so on.
Value
Value of the metric
Details
For two batches j and j* (see next paragraph for the case with more batches): 1) the distances between all pairs of observations in batch j - denoted as {dist_j} - and
the distances between all such pairs in batch j* - denoted as {dist_j*} - are calculated;
2) for each observation in j the distances to all observations in j* are calculated,
resulting in n_j x n_j* distances denoted as {dist_jj*}; calculate the
Kullback-Leibler divergence between the densities of {dist_j} and {dist_jj*} and that
between the densities of {dist_j*} and {dist_jj*} - using the k-nearest neighbours based
method by Boltz et al (2009) with k=5; 3) take the weighted mean of the values of
these two divergences with weights proportional to n_j and n_j*.
For more than two batches: 1) for all possible pairs of batches: calculate the metric as described above; 2) calculate
the weighted average of the values in 1) with weights proportional to the sum of the sample sizes in the two respective batches.
The variables are standardized before the calculation to make the metric independent of scale.
References
Lee, J. A., Dobbin, K. K., Ahn, J. (2014) Covariance adjustment for batch effect in gene expression data. Statistics in Medicine, 33, 2681-2695.
Boltz, S., Debreuve, E., Barlaud, M. (2009) High-dimensional statistical measure for region-of-interest tracking. Transactions in Image Processing, 18(6), 1266-1283.
Hornung, R., Boulesteix, A.-L., Causeur, D. (2015) Combining location-and-scale batch effect adjustment with data cleaning by latent factor adjustment. Tech. Rep. 184, Department of Statistics, University of Munich.