Learn R Programming

ZVCV (version 1.0.0)

zvcv: ZV-CV for general expectations

Description

The function zvcv is used to perform (regularised) ZV-CV given a set of samples, derivatives and function evaluations.

Usage

zvcv(integrand, samples, derivatives, log_weight,
  integrand_logged = FALSE, obs_estim_choose, obs_estim,
  options = list(polyorder = 2, regul_reg = TRUE, alpha_elnet = 1, nfolds
  = 10, apriori = (1:NCOL(samples)), intercept = TRUE, polyorder_max =
  Inf), folds_choose = 5)

Arguments

integrand

The integrand, i.e. a set of \(N\) evaluations of the function of interest.

samples

An \(N\) by \(d\) matrix of samples from the target

derivatives

An \(N\) by \(d\) matrix of derivatives of the log target with respect to the parameters

log_weight

(optional) A vector of length \(N\) containing log weights of the samples. The default is equal weights.

integrand_logged

(optional) Sometimes it is better to input the integrand on the logged scale for stability. If the actual integrand is the exponential of integrand, then integrand_logged = TRUE. Otherwise, the default of integrand_logged = FALSE should be used.

obs_estim_choose

(optional) The indices of the samples to be used in estimating the MSE for cross validation. This is not relevant if options is a single list of specifications. If options is a list of multiple control variate specifications, then the default is to use 10 percent of the full data set for estimating the MSE.

obs_estim

(optional) The indices of the samples to be used in evaluating the integrand. If this is missing or NULL then the full data set is used for both estimating the polynomial and evaluating the integrand. Otherwise, the samples from obs_estim are used in evaluating the integral and the remainder are used in estimating the polynomial.

options

A list of control variate specifications. This can be a single list containing the elements below (the defaults are used for elements which are not specified). Alternatively, it can be a list of lists containing any or all of the elements below. Where the latter is used, the function zvcv automatically selects the best performing option based on cross-validation.

  • polyorder: The order of the polynomial, with a default of 2. A value of Inf will get the cross-validation method to choose between orders.

  • regul_reg: A flag for whether regularised regression is to be used. The default is TRUE, i.e. regularised regression is used.

  • alpha_elnet: The alpha parameter for elastic net. The default is 1, which correponds to LASSO. A value of 0 would correspond to ridge regression.

  • nfolds: The number of folds used in cross-validation to select lambda for LASSO or elastic net. The default is 10.

  • apriori: The indices of the parameters to use in the polynomial. The default is to use all parameters \(1:d\) where \(d\) is the dimension of the target. If only the first and third derivatives should be used, then this would be specified by using apriori = c(1,3) (alternatively, this can be done by only using the relevant columns in samples and derivatives).

  • intercept: A flag for whether the intercept should be estimated or fixed to the empirical mean of the integrand in the estimation set. The default is to include an intercept (intercept = TRUE) as this tends to lead to better variance reductions. Note that an intercept = TRUE flag may be changed to intercept = FALSE within the function if integrand_logged = TRUE and a NaN is encountered. See South et al. (2018) for further details.

  • polyorder_max: The maximum allowable polynomial order. This may be used to prevent memory issues in the case that the polynomial order is selected automatically. You will be prompted to see if you wish to continue if the selected polynomial order results in a design matrix with more than ten million elements. A default maximum polynomial order will be selected if the polyorder is infinite and in this case a warning will be given. Recall that setting your default R settings to options(warn=1) will ensure that you receive these warnings in real time. Optimal polynomial order selection may go to at most this maximum value, or it may stop earlier.

folds_choose

The number of folds used in k-fold cross-validation for selecting the optimal control variate. Depending on the options, this may include selection of the optimal polynomial order, regression type and subset of parameters in the polynomial. The default is five.

Value

A list is returned, containing the following components:

  • expectation: The estimate of the expectation.

  • num_select: The number of non-zero coefficients in the polynomial.

  • mse: The mean square error for the evaluation set.

  • integrand_logged: The integrand_logged input stored for reference.

  • obs_estim: The obs_estim input stored for reference.

  • polyorder: The polyorder value used in the final estimate.

  • regul_reg: The regul_reg flag used in the final estimate.

  • alpha_elnet: The alpha_elnet value used in the final estimate.

  • nfolds: The nfolds value used in the final estimate.

  • apriori: The apriori value used in the final estimate.

  • intercept: The intercept flag used in the final estimate.

References

Mira, A., Solgi, R., & Imparato, D. (2013). Zero variance Markov chain Monte Carlo for Bayesian estimators. Statistics and Computing, 23(5), 653-662.

South, L. F., Oates, C. J., Mira, A., & Drovandi, C. (2019). Regularised zero variance control variates for high-dimensional variance reduction. https://arxiv.org/abs/1811.05073

See Also

See evidence for functions which use zvcv to estimate the normalising constant of the posterior. See an example at VDP and see ZVCV for more package details.

Examples

Run this code
# NOT RUN {
# Estimating the mean of theta1 when theta is bivariate normally distributed with:
mymean <- c(1,2)
mycov <- matrix(c(1,0.5,0.5,2),nrow=2)

# Perfect draws from the target distribution (could be replaced with
# approximate draws from e.g. MCMC or SMC)
N <- 50
require(mvtnorm)
set.seed(1)
samples <- rmvnorm(N, mean = mymean, sigma = mycov)
integrand <- samples[,1]

# derivatives of Gaussian wrt x
derivatives <- t( apply(samples,1,function(x) -solve(mycov)%*%(x - mymean)) ) 

# Estimates without ZV-CV (i.e. vanilla Monte Carlo integration)
# Without the ZVCV package
mean(integrand)
# With the ZVCV package
zvcv(integrand,samples,derivatives,options = list(polyorder = 0))$expectation

# ZV-CV with fixed specifications
# 2nd order polynomial with LASSO
sprintf("%.15f",zvcv(integrand, samples, derivatives)$expectation)
# 2nd order polynomial with OLS
sprintf("%.15f",zvcv(integrand, samples, derivatives,
options = list(polyorder = 2, regul_reg = FALSE))$expectation) 

# ZV-CV with cross validation
# Choose between OLS and LASSO, with the order selected using cross validation
myopts <- list(list(polyorder = Inf, regul_reg = FALSE),list(polyorder = Inf)) 
temp <- zvcv(integrand,samples,derivatives,options = myopts) 
temp$polyorder # The chosen control variate order
sprintf("%.15f",temp$expectation) # The expectation based on the minimum MSE control variate
temp$regul_reg # Flag for if the chosen control variate uses regularisation

# }

Run the code above in your browser using DataLab