Learn R Programming

dapper (version 1.1.0)

dapper_sample: Private Posterior Sampler

Description

Generates samples from the private posterior using a data augmentation framework.

Usage

dapper_sample(
  data_model = NULL,
  sdp = NULL,
  init_par = NULL,
  seed = NULL,
  niter = 2000,
  warmup = floor(niter/2),
  chains = 1
)

Value

A dpout object which contains:

  • chain: a draws_matrix object containing niter - warmup draws from the private posterior.

  • mean_accept: a (niter - warmup) row matrix containing the average acceptance rate over all latent records for each iteration. Each column corresponds to a parameter.

  • comp_accept: a matrix containing n rows, where n is the number of latent records. Each row gives the mean acceptance rate over all iterations for an individual record.

Arguments

data_model

a data model represented by a privacy class object.

sdp

the observed privatized data. Must be a vector or matrix.

init_par

initial starting point of the chain.

seed

set random seed.

niter

number of draws.

warmup

number of iterations to discard as warmup. Default is half of niter.

chains

number of MCMC chains to run. Can be done in parallel or sequentially.

Details

Generates samples from the private posterior implied by data_model. The data_model input must by an object of class privacy which is created using the new_privacy() constructor. MCMC chains can be run in parallel using furrr::future_map(). See the furrr package documentation for specifics. Long computations can be monitored with the progressr package.

References

Ju, N., Awan, J. A., Gong, R., & Rao, V. A. (2022). Data Augmentation MCMC for Bayesian Inference from Privatized Data. arXiv. tools:::Rd_expr_doi("https://doi.org/10.48550/ARXIV.2206.00710")

See Also

new_privacy()

Examples

Run this code
#simulate confidential data
#privacy mechanism adds gaussian noise to each observation.
set.seed(1)
n <- 100
eps <- 3
y <- rnorm(n, mean = -2, sd = 1)
sdp <- mean(y) + rnorm(1, 0, 1/eps)

posterior_f <- function(dmat, theta) {
    x <- c(dmat)
    xbar <- mean(x)
    n <- length(x)
    pr_m <- 0
    pr_s2 <- 4
    ps_s2 <- 1/(1/pr_s2 + n)
    ps_m <- ps_s2 * ((1/pr_s2)*pr_m + n * xbar)
    rnorm(1, mean = ps_m, sd = sqrt(ps_s2))
}
latent_f <- function(theta) {
    matrix(rnorm(100, mean = theta, sd = 1), ncol = 1)
}
statistic_f <- function(xi, sdp, i) {
    xi
}
mechanism_f <- function(sdp, sx) {
  sum(dnorm(sdp - sx/n, 0, 1/eps, TRUE))
}
dmod <- new_privacy(posterior_f = posterior_f,
  latent_f = latent_f,
  mechanism_f = mechanism_f,
  statistic_f = statistic_f,
  npar = 1)

out <- dapper_sample(dmod,
                    sdp = sdp,
                    init_par = -2,
                    niter = 500)
summary(out)

# for parallel computing we 'plan' a session
# the code below uses 2 CPU cores for parallel computing
library(furrr)
plan(multisession, workers = 2)
out <- dapper_sample(dmod,
                    sdp = sdp,
                    init_par = -2,
                    niter = 500,
                    chains = 2)

# to go back to sequential computing we use
plan(sequential)

Run the code above in your browser using DataLab