mpost.euc: Median Posterior for Subset Posterior Samples in Euclidean Space

Description

mpost.euc is a general framework to merge multiple empirical measures $Q_1,Q_2,\ldots,Q_M \subset R^p$ from independent subset of data by finding a median $$\hat{Q} = \textrm{argmin}_Q \sum_{m=1}^M d(Q,Q_m)$$ where $Q$ is a weighted combination and $d(P_1,P_2)$ is distance in RKHS between two empirical measures $P_1$ and $P_2$. As in the references, we use RBF kernel with bandwidth parameter $\sigma$.

Usage

mpost.euc(
  splist,
  sigma = 0.1,
  maxiter = 121,
  abstol = 1e-06,
  show.progress = FALSE
)

Value

a named list containing:

med.atoms: a vector or matrix of all atoms aggregated.
med.weights: a weight vector that sums to 1 corresponding to med.atoms.
weiszfeld.weights: a weight for $M$ subset posteriors.
weiszfeld.history: updated parameter values. Each row is for iteration, while columns are weights corresponding to weiszfeld.weights.

Arguments

splist: a list of length $M$ containing vectors or matrices of univariate or multivariate subset posterior samples respectively.
sigma: bandwidth parameter for RBF kernel.
maxiter: maximum number of iterations for Weiszfeld algorithm.
abstol: stopping criterion for Weiszfeld algorithm.
show.progress: a logical; TRUE to show iteration mark, FALSE otherwise.

References

minsker_scalable_2014SBmedian

minsker_robust_2017SBmedian

Examples

Run this code

## Median Posteior from 2-D Gaussian Samples
#  Step 1. let's build a list of atoms whose numbers differ
set.seed(8128)                   # for reproducible results
mydata = list()
mydata[[1]] = cbind(rnorm(96, mean= 1), rnorm(96, mean= 1))
mydata[[2]] = cbind(rnorm(78, mean=-1), rnorm(78, mean= 0))
mydata[[3]] = cbind(rnorm(65, mean=-1), rnorm(65, mean= 1))
mydata[[4]] = cbind(rnorm(77, mean= 2), rnorm(77, mean=-1))

#  Step 2. Let's run the algorithm
myrun = mpost.euc(mydata, show.progress=TRUE)

#  Step 3. Visualize
#  3-1. show subset posterior samples
opar <- par(no.readonly=TRUE)
par(mfrow=c(2,3), no.readonly=TRUE)
for (i in 1:4){
  plot(mydata[[i]], cex=0.5, col=(i+1), pch=19, xlab="", ylab="", 
       main=paste("subset",i), xlim=c(-4,4), ylim=c(-3,3))
}

#  3-2. 250 median posterior samples via importance sampling
id250 = base::sample(1:nrow(myrun$med.atoms), 250, prob=myrun$med.weights, replace=TRUE)
sp250 = myrun$med.atoms[id250,]
plot(sp250, cex=0.5, pch=19, xlab="", ylab="", 
     xlim=c(-4,4), ylim=c(-3,3), main="median samples")

#  3-3. convergence over iterations
matplot(myrun$weiszfeld.history, xlab="iteration", ylab="value",
        type="b", main="convergence of weights")
par(opar)

Run the code above in your browser using DataLab