The function zvcv
is used to perform (regularised) ZV-CV given a set of samples, derivatives and function evaluations.
zvcv(integrand, samples, derivatives, log_weight,
integrand_logged = FALSE, obs_estim_choose, obs_estim,
options = list(polyorder = 2, regul_reg = TRUE, alpha_elnet = 1, nfolds
= 10, apriori = (1:NCOL(samples)), intercept = TRUE, polyorder_max =
Inf), folds_choose = 5)
The integrand, i.e. a set of
An
An
(optional) A vector of length
(optional) Sometimes it is better to input the integrand on the logged scale for stability. If the actual integrand is the exponential of integrand
, then integrand_logged = TRUE
. Otherwise, the default of integrand_logged = FALSE
should be used.
(optional) The indices of the samples to be used in estimating the MSE for cross validation. This is not relevant if options
is a single list of specifications. If options
is a list of multiple control variate specifications, then the default is to use 10 percent of the full data set for estimating the MSE.
(optional) The indices of the samples to be used in evaluating the integrand. If this is missing or NULL
then the full data set is used for both estimating the polynomial and evaluating the integrand. Otherwise, the samples from obs_estim
are used in evaluating the integral and the remainder are used in estimating the polynomial.
A list of control variate specifications. This can be a single list containing the elements below (the defaults are used for elements which are not specified). Alternatively, it can be a list of lists containing any or all of the elements below. Where the latter is used, the function zvcv
automatically selects the best performing option based on cross-validation.
polyorder
: The order of the polynomial, with a default of 2. A value of Inf
will get the cross-validation method to choose between orders.
regul_reg
: A flag for whether regularised regression is to be used. The default is TRUE, i.e. regularised regression is used.
alpha_elnet
: The alpha parameter for elastic net. The default is 1, which correponds to LASSO. A value of 0 would correspond to ridge regression.
nfolds
: The number of folds used in cross-validation to select lambda for LASSO or elastic net. The default is 10.
apriori
: The indices of the parameters to use in the polynomial. The default is to use all parameters apriori
= c(1,3) (alternatively, this can be done by only using the relevant columns in samples
and derivatives
).
intercept
: A flag for whether the intercept should be estimated or fixed to the empirical mean of the integrand in the estimation set. The default is to include an intercept (intercept = TRUE
) as this tends to lead to better variance reductions. Note that an intercept = TRUE
flag may be changed to intercept = FALSE
within the function if integrand_logged = TRUE
and a NaN
is encountered. See South et al. (2018) for further details.
polyorder_max
: The maximum allowable polynomial order. This may be used to prevent memory issues in the case that the polynomial order is selected automatically. You will be prompted to see if you wish to continue if the selected polynomial order results in a design matrix with more than ten million elements. A default maximum polynomial order will be selected if the polyorder
is infinite and in this case a warning will be given. Recall that setting your default R settings to options(warn=1)
will ensure that you receive these warnings in real time. Optimal polynomial order selection may go to at most this maximum value, or it may stop earlier.
The number of folds used in k-fold cross-validation for selecting the optimal control variate. Depending on the options
, this may include selection of the optimal polynomial order, regression type and subset of parameters in the polynomial. The default is five.
A list is returned, containing the following components:
expectation
: The estimate of the expectation.
num_select
: The number of non-zero coefficients in the polynomial.
mse
: The mean square error for the evaluation set.
integrand_logged
: The integrand_logged
input stored for reference.
obs_estim
: The obs_estim
input stored for reference.
polyorder
: The polyorder
value used in the final estimate.
regul_reg
: The regul_reg
flag used in the final estimate.
alpha_elnet
: The alpha_elnet
value used in the final estimate.
nfolds
: The nfolds
value used in the final estimate.
apriori
: The apriori
value used in the final estimate.
intercept
: The intercept
flag used in the final estimate.
Mira, A., Solgi, R., & Imparato, D. (2013). Zero variance Markov chain Monte Carlo for Bayesian estimators. Statistics and Computing, 23(5), 653-662.
South, L. F., Oates, C. J., Mira, A., & Drovandi, C. (2019). Regularised zero variance control variates for high-dimensional variance reduction. https://arxiv.org/abs/1811.05073
See evidence
for functions which use zvcv
to estimate the normalising constant of the posterior. See an example at VDP
and see ZVCV for more package details.
# NOT RUN {
# Estimating the mean of theta1 when theta is bivariate normally distributed with:
mymean <- c(1,2)
mycov <- matrix(c(1,0.5,0.5,2),nrow=2)
# Perfect draws from the target distribution (could be replaced with
# approximate draws from e.g. MCMC or SMC)
N <- 50
require(mvtnorm)
set.seed(1)
samples <- rmvnorm(N, mean = mymean, sigma = mycov)
integrand <- samples[,1]
# derivatives of Gaussian wrt x
derivatives <- t( apply(samples,1,function(x) -solve(mycov)%*%(x - mymean)) )
# Estimates without ZV-CV (i.e. vanilla Monte Carlo integration)
# Without the ZVCV package
mean(integrand)
# With the ZVCV package
zvcv(integrand,samples,derivatives,options = list(polyorder = 0))$expectation
# ZV-CV with fixed specifications
# 2nd order polynomial with LASSO
sprintf("%.15f",zvcv(integrand, samples, derivatives)$expectation)
# 2nd order polynomial with OLS
sprintf("%.15f",zvcv(integrand, samples, derivatives,
options = list(polyorder = 2, regul_reg = FALSE))$expectation)
# ZV-CV with cross validation
# Choose between OLS and LASSO, with the order selected using cross validation
myopts <- list(list(polyorder = Inf, regul_reg = FALSE),list(polyorder = Inf))
temp <- zvcv(integrand,samples,derivatives,options = myopts)
temp$polyorder # The chosen control variate order
sprintf("%.15f",temp$expectation) # The expectation based on the minimum MSE control variate
temp$regul_reg # Flag for if the chosen control variate uses regularisation
# }
Run the code above in your browser using DataLab