rDPGibbs: Density Estimation with Dirichlet Process Prior and Normal Base

Description

rDPGibbs implements a Gibbs Sampler to draw from the posterior for a normal mixture problem with a Dirichlet Process prior. A natural conjugate base prior is used along with priors on the hyper parameters of this distribution. One interpretation of this model is as a normal mixture with a random number of components that can grow with the sample size.

Usage

rDPGibbs(Prior, Data, Mcmc)

Arguments

Prior

list(Prioralpha,lambda_hyper)

Data

list(y)

Mcmc

list(R,keep,nprint,maxuniq,SCALE,gridsize)

Value

nmix

a list containing: probdraw,zdraw,compdraw

alphadraw

vector of draws of DP process tightness parameter

nudraw

vector of draws of base prior hyperparameter

adraw

vector of draws of base prior hyperparameter

vdraw

vector of draws of base prior hyperparameter

Details

Model: \(y_i\) \(\sim\) \(N(\mu_i,\Sigma_i)\).

Priors: \(\theta_i=(\mu_i,\Sigma_i)\) \(\sim\) \(DP(G_0(\lambda),alpha)\) \(G_0(\lambda):\) \(\mu_i | \Sigma_i\) \(\sim\) \(N(0,\Sigma_i (x) a^{-1})\) \(\Sigma_i\) \(\sim\) \(IW(nu,nu*v*I)\)

\(\lambda(a,nu,v):\) \(a\) \(\sim\) uniform on grid[alim[1],alimb[2]] \(nu\) \(\sim\) uniform on grid[dim(data)-1 + exp(nulim[1]),dim(data)-1 +exp(nulim[2])] \(v\) \(\sim\) uniform on grid[vlim[1],vlim[2]]

\(alpha\) \(\sim\) \((1-(\alpha-alphamin)/(alphamax-alphamin))^{power}\) \(alpha\)= alphamin then expected number of components = Istarmin \(alpha\)= alphamax then expected number of components = Istarmax

List arguments contain:

Data:

yN x k matrix of observations on k dimensional data

Prioralpha:

Istarmin expected number of components at lower bound of support of alpha (def: 1)
Istarmax expected number of components at upper bound of support of alpha
power power parameter for alpha prior (def: .8)

lambda_hyper:

alim defines support of a distribution (def: (.01,10))
nulim defines support of nu distribution (def: (.01,3))
vlim defines support of v distribution (def: (.1,4))

Mcmc:

R number of mcmc draws
keep thinning parm, keep every keepth draw
nprint print the estimated time remaining for every nprint'th draw (def: 100)
maxuniq storage constraint on the number of unique components (def: 200)
SCALE should data be scaled by mean,std deviation before posterior draws, (def: TRUE)
gridsize number of discrete points for hyperparameter priors,def: 20

the basic output are draws from the predictive distribution of the data in the object, nmix. The average of these draws is the Bayesian analogue of a density estimate.

nmix:

probdraw R/keep x 1 matrix of 1s
zdraw R/keep x N matrix of draws of indicators of which component each obs is assigned to
compdraw R/keep list of draws of normals

Output of the components is in the form of a list of lists. compdraw[[i]] is ith draw -- list of lists. compdraw[[i]][[1]] is list of parms for a draw from predictive. compdraw[[i]][1]][[1]] is the mean vector. compdraw[[i]][[1]][[2]] is the inverse of Cholesky root. \(\Sigma\) = t(R)%*%R, \(R^{-1}\) = compdraw[[i]][[1]][[2]].

Examples

Run this code

if(nchar(Sys.getenv("LONG_TEST")) != 0) {R=2000} else {R=10}

## simulate univariate data from Chi-Sq

set.seed(66)
N=200
chisqdf=8; y1=as.matrix(rchisq(N,df=chisqdf))

## set arguments for rDPGibbs

Data1=list(y=y1)
Prioralpha=list(Istarmin=1,Istarmax=10,power=.8)
Prior1=list(Prioralpha=Prioralpha)

Mcmc=list(R=R,keep=1,maxuniq=200)

out1=rDPGibbs(Prior=Prior1,Data=Data1,Mcmc)

if(0){
## plotting examples
rgi=c(0,20); grid=matrix(seq(from=rgi[1],to=rgi[2],length.out=50),ncol=1)
deltax=(rgi[2]-rgi[1])/nrow(grid)
plot(out1$nmix,Grid=grid,Data=y1)
## plot true density with historgram
plot(range(grid[,1]),1.5*range(dchisq(grid[,1],df=chisqdf)),
  type="n",xlab=paste("Chisq ; ",N," obs",sep=""), ylab="")
hist(y1,xlim=rgi,freq=FALSE,col="yellow",breaks=20,add=TRUE)
lines(grid[,1],dchisq(grid[,1],df=chisqdf)/
  (sum(dchisq(grid[,1],df=chisqdf))*deltax),col="blue",lwd=2)
}


## simulate bivariate data from the  "Banana" distribution (Meng and Barnard) 
banana=function(A,B,C1,C2,N,keep=10,init=10)
{ R=init*keep+N*keep
   x1=x2=0
   bimat=matrix(double(2*N),ncol=2)
  for (r in 1:R)
  { x1=rnorm(1,mean=(B*x2+C1)/(A*(x2^2)+1),sd=sqrt(1/(A*(x2^2)+1)))
  x2=rnorm(1,mean=(B*x2+C2)/(A*(x1^2)+1),sd=sqrt(1/(A*(x1^2)+1)))
  if (r>init*keep && r%%keep==0) {mkeep=r/keep; bimat[mkeep-init,]=c(x1,x2)} }
return(bimat)
}


set.seed(66)
nvar2=2
A=0.5; B=0; C1=C2=3
y2=banana(A=A,B=B,C1=C1,C2=C2,1000)

Data2=list(y=y2)
Prioralpha=list(Istarmin=1,Istarmax=10,power=.8)
Prior2=list(Prioralpha=Prioralpha)
Mcmc=list(R=R,keep=1,maxuniq=200)

out2=rDPGibbs(Prior=Prior2,Data=Data2,Mcmc)


if(0){
## plotting examples

rx1=range(y2[,1]); rx2=range(y2[,2])
x1=seq(from=rx1[1],to=rx1[2],length.out=50)
x2=seq(from=rx2[1],to=rx2[2],length.out=50)
grid=cbind(x1,x2)

plot(out2$nmix,Grid=grid,Data=y2)

## plot true bivariate density
tden=matrix(double(50*50),ncol=50)
for (i in 1:50){ for (j in 1:50) 
      {tden[i,j]=exp(-0.5*(A*(x1[i]^2)*(x2[j]^2)+
      (x1[i]^2)+(x2[j]^2)-2*B*x1[i]*x2[j]-2*C1*x1[i]-2*C2*x2[j]))}
}
tden=tden/sum(tden)
image(x1,x2,tden,col=terrain.colors(100),xlab="",ylab="")
contour(x1,x2,tden,add=TRUE,drawlabels=FALSE)
title("True Density")
}

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

Details

See Also

Examples