kullCOP: Kullback--Leibler Divergence, Jeffrey Divergence, and Kullback--Leibler Sample Size

Description

Compute the Kullback--Leibler divergence, Jeffrey divergence, and Kullback--Leibler sample size following Joe (2014, pp. 234--237). Consider two densities $f = c_1(u,v; \Theta_f)$ and $g = c_2(u,v; \Theta_g)$ for two different bivariate copulas $\mathbf{C}_1(\Theta_1)$ and $\mathbf{C}_2(\Theta_2)$ having respective parameters $\Theta$, then the Kullback--Leibler divergence of $f$ relative to $g$ is $$\mathrm{KL}(f|g) = \int\!\!\int_{\mathcal{I}^2} g\, \log(g/f)\,\mathrm{d}u\mathrm{d}v\mbox{,}$$ and Kullback--Leibler divergence of $g$ relative to $f$ is $$\mathrm{KL}(g|f) = \int\!\!\int_{\mathcal{I}^2} f\, \log(f/g)\,\mathrm{d}u\mathrm{d}v\mbox{,}$$ where the limits of integration $\mathcal{I}^2$ theoretically are closed on $[0,1]^2$ but an open interval $(0,1)^2$ might be needed for numerical integration. Note that in general $\mathrm{KL}(f|g) \ne \mathrm{KL}(g|f)$. The $\mathrm{KL}(f|g)$ is the expected log-likelihood ratios of $g$ to $f$ when $g$ is the true density (Joe, 2014, p. 234), whereas $\mathrm{KL}(g|f)$ is the opposite.

This asymmetry leads to Jeffrey divergence, which is defined as a symmetrized version of the two Kullback--Leibler divergences, and is $$J(f,g) = \mathrm{KL}(f|g) + \mathrm{KL}(g|f) = \int\!\!\int_{\mathcal{I}^2} (g-f)\, \log(g/f)\,\mathrm{d}u\mathrm{d}v\mbox{.}$$

The variances of the Kullback--Leibler divergences are defined as $$\sigma^2_{\mathrm{KL}(f|g)} = \int\!\!\int_{\mathcal{I}^2} g\,[\log(g/f)]^2\,\mathrm{d}u\mathrm{d}v - [\mathrm{KL}(f|g)]^2\mbox{,}$$ and $$\sigma^2_{\mathrm{KL}(g|f)} = \int\!\!\int_{\mathcal{I}^2} f\,[\log(f/g)]^2\,\mathrm{d}u\mathrm{d}v - [\mathrm{KL}(g|f)]^2\mbox{.}$$

For comparison of copula families $f$ and $g$ and taking an $\alpha = 0.05$, the Kullback--Leibler sample size is defined as $$n_{f\!g} = \bigl[\Phi^{(-1)}(1-\alpha) \times \eta_\mathrm{KL}\bigr]^2\mbox{,}$$ where $\Phi^{(-1)}(t)$ is the quantile function for the standard normal distribution $\sim$ N(0,1) for nonexceedance probability $t$, and $\eta_\mathrm{KL}$ is the maximum of $$\eta_\mathrm{KL} = \mathrm{max}\bigl[\sigma_{\mathrm{KL}(f|g)}/\mathrm{KL}(f|g),\, \sigma_{\mathrm{KL}(g|f)}/\mathrm{KL}(g|f)\bigr]\mbox{.}$$ The $n_{f\!g}$ gives an indication of the sample size needed to distinguish $f$ and $g$ with a probability of at least $1 - \alpha = 1 - 0.05 = 0.95$ or 95 percent.

The copBasic features a na<U+00EF>ve Monte Carlo integration scheme in the primary interface kullCOP, although the function kullCOPint provides for nested numerical integration. This later function is generally fast but suffers too much for general application from integral divergencies issued from the integrate() function in R---this must be judged in the light that the package focuses only on implementation of the function of the copula itself and neither numerical estimation of copula density (densityCOP) and not analytical copula densities or hybrid representations thereof. Sufficient “bread crumbs” are left among the code and documentation for users to re-implement if speed is paramount. Numerical comparison to the results of Joe (2014) (see Examples) suggests that the default Monte Carlo sample size should be more than sufficient for general inference with the expense of considerable CPU time; however, a couple of repeated calls of kullCOP would be advised and compute say the mean of the resulting sample sizes.

Usage

kullCOP(cop1=NULL, cop2=NULL, para1=NULL, para2=NULL, alpha=0.05,
           del=0, n=1E5, verbose=TRUE, sobol=FALSE, ...)
kullCOPint(cop1=NULL, cop2=NULL, para1=NULL, para2=NULL, alpha=0.05,
           del=.Machine$double.eps^0.25, verbose=TRUE, ...)

Arguments

cop1

A copula function corresponding to copula $f$ in Joe (2014);

para1

Vector of parameters or other data structure, if needed, to pass to the copula $f$;

cop2

A copula function corresponding to copula $g$ in Joe (2014);

para2

Vector of parameters or other data structure, if needed, to pass to the copula $g$;

alpha

The $\alpha$ in the Kullback--Leibler sample size equation;

del

A small value used to denote the lo and hi values of the numerical integration: lo = del and hi = 1 - del. If del == 0, then lo = 0 and hi = 1, which corresponds to the theoretical limits $\mathcal{I}^2 = [0,1]^2$ and are defaulted here to $[0,1]^2$ because the Monte Carlo algorithm is preferred for general application. The end point control, however, is maintained just in case pathological situations should arise;

kullCOP (Monte Carlo integration) only---the Monte Carlo integration simulation size;

verbose

A logical trigging a couple of status lines of output through the message() function in R;

sobol

A logical trigging Sobol sequences for the Monte Carlo integration instead of the bivariate uniform distribution. The Sobol sequences are dependent on the randtoolbox package and the sobol() function; and

...

Additional arguments to pass to the densityCOP function.

Value

An R list is returned having the following components:

MonteCarlo.sim.size

kullCOP (Monte Carlo integration) only---The simulation size for numerical integration;

divergences

A vector of the Kullback--Leibler divergences and their standard deviations: $\mathrm{KL}(f|g)$, $\sigma_{\mathrm{KL}(f|g)}$, $\mathrm{KL}(g|f)$, and $\sigma_{\mathrm{KL}(g|f)}$, respectively;

stdev.divergences

kullCOP (Monte Carlo integration) only---The standard deviation of the divergences and the variances;

Jeffrey.divergence

Jeffrey divergence $J(f,g)$;

KL.sample.size

Kullback--Leibler sample size $n_{f\!g}$; and

integrations

kullCOPint (numerical integration) only---An R list of the outer call of the integrate() function for the respective numerical integrals shown in this documentation.

References

Joe, H., 2014, Dependence modeling with copulas: Boca Raton, CRC Press, 462 p.

Examples

Run this code

# NOT RUN {
# See another demonstration under the Note section of statTn().
# }
# NOT RUN {
# Joe (2014, p. 237, table 5.2)
# The Gumbel-Hougaard copula and Plackett copula below each have a Kendall Tau of
# about 0.5. Joe (2014) lists in the table that Jeffrey divergence is about 0.110
# and the Kullback-Leibler sample size is 133. Joe (2014) does not list the
# parameters for either copula, just that Kendall Tau = 0.5.
KL <- kullCOP(cop1=GHcop, para1=2, cop2=PLACKETTcop, para2=11.40484)
# Reports Jeffrey divergence        =   0.1063858
#      Kullback-Leibler sample size = 136 (another run produces 131)

S <- replicate(20, kullCOP(cop1=GHcop, para1=2, cop2=PLACKETTcop,
                           para2=11.40484, verbose=FALSE)$KL.sample.size)
print(as.integer(c(mean(S), sd(S)))) # 134 plus/minus 5

# Joe (2014, p. 237, table 5.3)
# The Gumbel-Hougaard copula and Plackett copula below each have a Spearman Rho of
# about 0.5. Joe (2014) lists in the table that Jeffrey divergence is about 0.063
# and the Kullback-Leibler sample size is 210. Joe (2014) does not list the
# parameters for either copula, just that for Spearman Rho = 0.5.
KL <- kullCOP(cop1=GHcop, para1=1.541071, cop2=PLACKETTcop, para2=5.115658)
# Reports Jeffrey divergence        =   0.06381151
#      Kullback-Leibler sample size = 220 (another run produces 203)

S <- replicate(20, kullCOP(cop1=GHcop, para1=1.541071, cop2=PLACKETTcop,
                           para2=5.115658, verbose=FALSE)$KL.sample.size)
print(as.integer(c(mean(S), sd(S))))  # 220 plus/minus 16

# Joe (2014) likely did the numerical integrations using analytical solutions to the
# probability densities and not rectangular approximations as in densityCOP().
# }
# NOT RUN {
# }
# NOT RUN {
# Compare Jeffery Divergence estimates as functions of sample size when computed
# using Sobol sequences or not---Sobol sequences have less sampling variability.
GHpar <- PApar <- 2 # Spearman Rho = 0.6822339
Ns <- as.integer(10^c(seq(2.0, 3.5,by=0.01), seq(3.6, 5,by=0.2)))
JDuni <- sapply(1:length(Ns), function(i) {
                  kullCOP(cop1=GHcop, para1=GHpar, verbose=FALSE,
                          cop2=PARETOcop, para2=PApar, n=Ns[i],
                          sobol=FALSE)$Jeffrey.divergence })
JDsob <- sapply(1:length(Ns), function(i) {
                  kullCOP(cop1=GHcop, para1=GHpar, verbose=FALSE,
                          cop2=PARETOcop, para2=PApar, n=Ns[i],
                          sobol=TRUE )$Jeffrey.divergence })
plot(Ns, JDuni, type="l", log="x", # black line, notice likely outliers too
     xlab="Sample Size", ylab="Jeffery Divergence")
lines(Ns, JDsob, col=2) # red line
print(c(mean(JDuni), sd(JDuni))) # [1] 0.05923462 0.01544651
print(c(mean(JDsob), sd(JDsob))) # [1] 0.05863623 0.01184879
# So we see a slightly smaller variation when the Sobol sequence is used. 
# }