CF_crossval: Control functionals (CF) with cross-validation

Description

This function chooses between a list of kernel tuning parameters (sigma_list) or a list of K0 matrices (K0_list) for the control functionals method described in Oates et al (2017). The latter requires calculating and storing kernel matrices using K0_fn but it is more flexible because it can be used to choose the Stein operator order and the kernel function, in addition to its parameters. It is also faster to pre-specify K0_fn. For estimation with fixed kernel parameters, use CF.

Usage

CF_crossval(
  integrands,
  samples,
  derivatives,
  steinOrder = NULL,
  kernel_function = NULL,
  sigma_list = NULL,
  K0_list = NULL,
  est_inds = NULL,
  log_weights = NULL,
  one_in_denom = FALSE,
  folds = NULL,
  diagnostics = FALSE
)

Value

A list with the following elements:

expectation: The estimate(s) of the ($k$) expectation(s).
mse: A matrix of the cross-validation mean square prediction errors. The number of columns is the number of tuning options given and the number of rows is $k$, the number of integrands of interest.
optinds: The optimal indices from the list for each expectation.
f_true: (Only if est_inds is not NULL) The integrands for the evaluation set. This should be the same as integrands[setdiff(1:N,est_inds),].
f_hat: (Only if est_inds is not NULL) The fitted values for the integrands in the evaluation set. This can be used to help assess the performance of the Gaussian process model.
a: (Only if diagnostics = TRUE) The value of $a$ as described in South et al (2020), where predictions are of the form $f_hat = K0*a + 1*b$ for heldout K0 and estimators using heldout samples are of the form $mean(f - f_hat) + b$.
b: (Only if diagnostics = TRUE) The value of $b$ as described in South et al (2020), where predictions are of the form $f_hat = K0*a + 1*b$ for heldout K0 and estimators using heldout samples are of the form $mean(f - f_hat) + b$.
ksd: (Only if diagnostics = TRUE) An estimated kernel Stein discrepancy based on the fitted model that can be used for diagnostic purposes. See South et al (2020) for further details.
bound_const: (Only if diagnostics = TRUE and est_inds=NULL) This is such that the absolute error for the estimator should be less than $ksd \times bound_const$.

Arguments

integrands: An $N$ by $k$ matrix of integrands (evaluations of the function of interest)
samples: An $N$ by $d$ matrix of samples from the target
derivatives: An $N$ by $d$ matrix of derivatives of the log target with respect to the parameters
steinOrder: (optional) This is the order of the Stein operator. The default is 1 in the control functionals paper (Oates et al, 2017) and 2 in the semi-exact control functionals paper (South et al, 2020). The following values are currently available: 1 for all kernels and 2 for "gaussian", "matern" and "RQ". See below for further details.
kernel_function: (optional) Choose between "gaussian", "matern", "RQ", "product" or "prodsim". See below for further details.
sigma_list: (optional between this and K0_list) A list of tuning parameters for the specified kernel. This involves a list of single length-scale parameter in "gaussian" and "RQ", a list of vectors containing length-scale and smoothness parameters in "matern" and a list of vectors of the two parameters in "product" and "prodsim". See below for further details. When sigma_list is specified and not K0_list, the $K0$ matrix is computed twice for each selected tuning parameter.
K0_list: (optional between this and sigma_list) A list of kernel matrices, which can be calculated using K0_fn.
est_inds: (optional) A vector of indices for the estimation-only samples. The default when est_inds is missing or NULL is to perform both estimation of the control variates and evaluation of the integral using all samples. Otherwise, the samples from est_inds are used in estimating the control variates and the remainder are used in evaluating the integral. Splitting the indices in this way can be used to reduce bias from adaption and to make computation feasible for very large sample sizes (small est_inds is faster), but in general in will increase the variance of the estimator.
log_weights: (optional) A vector of length $N$ containing the logged weights of the samples. The default is equal weights. The weights are only used in estimating the cross-validation error. This method is not implemented for the case where est_inds is specified becausing specifying est_inds typically indicates a desire for an unbiased estimator and using self-normalised importance weights introduces bias.
one_in_denom: (optional) Whether or not to include a $1 + $ in the denominator of the control functionals estimator, as in equation 2 on p703 of Oates et al (2017). The $1 +$ in the denominator is an arbitrary choice so we set it to zero by default.
folds: (optional) The number of folds for cross-validation. The default is five.
diagnostics: (optional) A flag for whether to return the necessary outputs for plotting or estimating using the fitted model. The default is false since this requires some additional computation when est_inds is NULL.

Warning

Solving the linear system in CF has $O(N^3)$ complexity and is therefore not suited to large $N$. Using $est_inds$ will instead have an $O(N_0^3)$ cost in solving the linear system and an $O((N-N_0)^2)$ cost in handling the remaining samples, where $N_0$ is the length of $est_inds$. This can be much cheaper for large $N$.

On the choice of $\sigma$, the kernel and the Stein order

The kernel in Stein-based kernel methods is $L_x L_y k(x,y)$ where $L_x$ is a first or second order Stein operator in $x$ and $k(x,y)$ is some generic kernel to be specified.

The Stein operators for distribution $p(x)$ are defined as:

steinOrder=1: $L_x g(x) = \nabla_x^T g(x) + \nabla_x \log p(x)^T g(x)$ (see e.g. Oates el al (2017))
steinOrder=2: $L_x g(x) = \Delta_x g(x) + \nabla_x log p(x)^T \nabla_x g(x)$ (see e.g. South el al (2020))

Here $\nabla_x$ is the first order derivative wrt $x$ and $\Delta_x = \nabla_x^T \nabla_x$ is the Laplacian operator.

The generic kernels which are implemented in this package are listed below. Note that the input parameter sigma defines the kernel parameters $\sigma$.

"gaussian": A Gaussian kernel, $$k(x,y) = exp(-z(x,y)/\sigma^2)$$
"matern": A Matern kernel with $\sigma = (\lambda,\nu)$, $$k(x,y) = bc^{\nu}z(x,y)^{\nu/2}K_{\nu}(c z(x,y)^{0.5})$$ where $b=2^{1-\nu}(\Gamma(\nu))^{-1}$, $c=(2\nu)^{0.5}\lambda^{-1}$ and $K_{\nu}(x)$ is the modified Bessel function of the second kind. Note that $\lambda$ is the length-scale parameter and $\nu$ is the smoothness parameter (which defaults to 2.5 for $steinOrder=1$ and 4.5 for $steinOrder=2$).
"RQ": A rational quadratic kernel, $$k(x,y) = (1+\sigma^{-2}z(x,y))^{-1}$$
"product": The product kernel that appears in Oates et al (2017) with $\sigma = (a,b)$ $$k(x,y) = (1+a z(x) + a z(y))^{-1} exp(-0.5 b^{-2} z(x,y)) $$
"prodsim": A slightly different product kernel with $\sigma = (a,b)$ (see e.g. https://www.imperial.ac.uk/inference-group/projects/monte-carlo-methods/control-functionals/), $$k(x,y) = (1+a z(x))^{-1}(1 + a z(y))^{-1} exp(-0.5 b^{-2} z(x,y)) $$

In the above equations, $z(x) = \sum_j x[j]^2$ and $z(x,y) = \sum_j (x[j] - y[j])^2$. For the last two kernels, the code only has implementations for steinOrder=1. Each combination of steinOrder and kernel_function above is currently hard-coded but it may be possible to extend this to other kernels in future versions using autodiff. The calculations for the first three kernels above are detailed in South et al (2020).

Author

Leah F. South

References

Oates, C. J., Girolami, M. & Chopin, N. (2017). Control functionals for Monte Carlo integration. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(3), 695-718.

South, L. F., Karvonen, T., Nemeth, C., Girolami, M. and Oates, C. J. (2020). Semi-Exact Control Functionals From Sard's Method. https://arxiv.org/abs/2002.00033