rDPGibbs: Density Estimation with Dirichlet Process Prior and Normal Base

Description

rDPGibbs implements a Gibbs Sampler to draw from the posterior for a normal mixture problem with a Dirichlet Process prior. A natural conjugate base prior is used along with priors on the hyper parameters of this distribution. One interpretation of this model is as a normal mixture with a random number of components that can grow with the sample size.

Usage

rDPGibbs(Prior, Data, Mcmc)

Arguments

Prior

list(Prioralpha,lambda_hyper)

Data

list(y)

Mcmc

list(R,keep,maxuniq,SCALE,gridsize)

Value

nmixa list containing: probdraw,zdraw,compdraw
alphadrawvector of draws of DP process tightness parameter
nudrawvector of draws of base prior hyperparameter
adrawvector of draws of base prior hyperparameter
vdrawvector of draws of base prior hyperparameter

concept

bayes
MCMC
normal mixtures
Dirichlet Process
Gibbs Sampling

cr

- Istarmin
{expected number of components at lower bound of support of alpha} Istarmax{expected number of components at upper bound of support of alpha} power{power parameter for alpha prior}
- alim
{defines support of a distribution,def:c(.01,10) } nulim{defines support of nu distribution, def:c(.01,3)} vlim{defines support of v distribution, def:c(.1,4)}
- R
{number of mcmc draws} keep{thinning parm, keep every keepth draw} maxuniq{storage constraint on the number of unique components} SCALE{should data be scaled by mean,std deviation before posterior draws, def: TRUE} gridsize{number of discrete points for hyperparameter priors,def: 20}
the basic output are draws from the predictive distribution of the data in the object, nmix. The average of these draws is the Bayesian analogue of a density estimate. nmix:
- probdraw
{R/keep x 1 matrix of 1s} zdraw{R/keep x N matrix of draws of indicators of which component each obs is assigned to} compdraw{R/keep list of draws of normals}
compdraw[[i]] is ith draw -- list of lists. compdraw[[i]][[1]] is list of parms for a draw from predictive. compdraw[[i]][1]][[1]] is the mean vector. compdraw[[i]][[1]][[2]] is the inverse of Cholesky root. $Sigma$ = t(R)%*%R, $R^{-1}$ = compdraw[[i]][[1]][[2]].

Details

Model: $y_i$ $\sim$ $N(mu_i,Sigma_i)$. Priors: $theta_i=(mu_i,Sigma_i)$ $\sim$ $DP(G_0(lambda),alpha)$ $G_0(lambda):$ $mu_i | Sigma_i$ $\sim$ $N(0,Sigma_i (x) a^{-1})$ $Sigma_i$ $\sim$ $IW(nu,nu*v*I)$ $lambda(a,nu,v):$ $a$ $\sim$ uniform on grid[alim[1],alimb[2]] $nu$ $\sim$ uniform on grid[dim(data)-1 + exp(nulim[1]),dim(data)-1 +exp(nulim[2])] $v$ $\sim$ uniform on grid[vlim[1],vlim[2]] $alpha$ $\sim$ $(1-(alpha-alphamin)/(alphamax-alphamin))^power$ alpha= alphamin then expected number of components = Istarmin alpha= alphamax then expected number of components = Istarmax list arguments Data:

{N x k matrix of observations on k dimensional data}

Examples

Run this code

if(nchar(Sys.getenv("LONG_TEST")) != 0) {R=2000} else {R=10}

## simulate univariate data from Chi-Sq

set.seed(66)
N=200
chisqdf=8; y1=as.matrix(rchisq(N,df=chisqdf))

## set arguments for rDPGibbs

Data1=list(y=y1)
Prioralpha=list(Istarmin=1,Istarmax=10,power=.8)
Prior1=list(Prioralpha=Prioralpha)

Mcmc=list(R=R,keep=1,maxuniq=200)

out1=rDPGibbs(Prior=Prior1,Data=Data1,Mcmc)

if(0){
## plotting examples
rgi=c(0,20); grid=matrix(seq(from=rgi[1],to=rgi[2],length.out=50),ncol=1)
deltax=(rgi[2]-rgi[1])/nrow(grid)
plot(out1$nmix,Grid=grid,Data=y1)
## plot true density with historgram
plot(range(grid[,1]),1.5*range(dchisq(grid[,1],df=chisqdf)),type="n",xlab=paste("Chisq ; ",N,"obs",sep=""), ylab="")
hist(y1,xlim=rgi,freq=FALSE,col="yellow",breaks=20,add=TRUE)
lines(grid[,1],dchisq(grid[,1],df=chisqdf)/(sum(dchisq(grid[,1],df=chisqdf))*deltax),col="blue",lwd=2)
}


## simulate bivariate data from the  "Banana" distribution (Meng and Barnard) 
banana=function(A,B,C1,C2,N,keep=10,init=10)
{ R=init*keep+N*keep
   x1=x2=0
   bimat=matrix(double(2*N),ncol=2)
  for (r in 1:R)
  { x1=rnorm(1,mean=(B*x2+C1)/(A*(x2^2)+1),sd=sqrt(1/(A*(x2^2)+1)))
  x2=rnorm(1,mean=(B*x2+C2)/(A*(x1^2)+1),sd=sqrt(1/(A*(x1^2)+1)))
  if (r>init*keep && r%%keep==0) {mkeep=r/keep; bimat[mkeep-init,]=c(x1,x2)} }
return(bimat)
}


set.seed(66)
nvar2=2
A=0.5; B=0; C1=C2=3
y2=banana(A=A,B=B,C1=C1,C2=C2,1000)

Data2=list(y=y2)
Prioralpha=list(Istarmin=1,Istarmax=10,power=.8)
Prior2=list(Prioralpha=Prioralpha)
Mcmc=list(R=R,keep=1,maxuniq=200)

out2=rDPGibbs(Prior=Prior2,Data=Data2,Mcmc)


if(0){
## plotting examples

rx1=range(y2[,1]); rx2=range(y2[,2])
x1=seq(from=rx1[1],to=rx1[2],length.out=50)
x2=seq(from=rx2[1],to=rx2[2],length.out=50)
grid=cbind(x1,x2)

plot(out2$nmix,Grid=grid,Data=y2)

## plot true bivariate density
tden=matrix(double(50*50),ncol=50)
for (i in 1:50){ for (j in 1:50) 
      {tden[i,j]=exp(-0.5*(A*(x1[i]^2)*(x2[j]^2)+(x1[i]^2)+(x2[j]^2)-2*B*x1[i]*x2[j]-2*C1*x1[i]-2*C2*x2[j]))}
}
tden=tden/sum(tden)
image(x1,x2,tden,col=terrain.colors(100),xlab="",ylab="")
contour(x1,x2,tden,add=TRUE,drawlabels=FALSE)
title("True Density")
}

Run the code above in your browser using DataLab