HDPMdensity: Bayesian analysis for a hierarchical Dirichlet Process mixture of normals model for marginal density estimation

Description

This function generates a posterior density sample for a DP mixture of normals model for related random probability measures. Support provided by the NIH/NCI R01CA75981 grant.

Usage

HDPMdensity(y,study,ngrid=100,prior,mcmc,state,
            status,data=sys.frame(sys.parent()),
            na.action=na.fail,work.dir=NULL)

Arguments

a matrix of responses.

study

a (1 by nrec) vector of study indicators. The i-th index is the study i that response j belongs to.

ngrid

integer giving the number of grid points where the density estimate is evaluated. The default is 100.

prior

a list giving the prior information. The list includes the following parameters: pe1 and pe0 giving the prior weights for the point mass at $\epsilon=1$ and at $\epsilon=1$, respectively, ae and be giving the prior parameters for a Beta prior on $\epsilon$, eps giving the value of $\epsilon$ (it must be specified if pe1 is missing), a0 and b0 vectors giving the hyperparameters for prior distribution of the precision parameter of the Dirichlet process prior, alpha giving the vector of precision parameters (it must be specified if a0 is missing), m0 and S0 giving the hyperparameters of the normal prior distribution for the mean of the normal baseline distribution, mub giving the mean of the normal baseline distribution (is must be specified if m0 is missing), nub and tbinv giving the hyperparameters of the inverse Wishart prior distribution for the variance of the normal baseline distribution, sigmab giving the variance of the normal baseline distribution (is must be specified if nub is missing), nu and tinv giving the hyperparameters of the inverse Wishart prior distribution for the variance of the normal kernel, and sigma giving the covariance matrix of the normal kernel (is must be specified if nu is missing).

mcmc

a list giving the MCMC parameters. The list must include the following integers: nburn giving the number of burn-in scans, nskip giving the thinning interval, nsave giving the total number of scans to be saved, ndisplay giving the number of saved scans to be displayed on screen.

state

a list giving the current value of the parameters. This list is used if the current analysis is the continuation of a previous analysis (not available yet).

status

a logical variable indicating whether this run is new (TRUE) or the continuation of a previous analysis (FALSE). In the latter case the current value of the parameters must be specified in the object state (not available yet).

data

data frame.

na.action

a function that indicates what should happen when the data contain NAs. The default action (na.fail) causes HDPdensity to print an error message and terminate if there are any incomplete observations.

work.dir

working directory.

Value

An object of class HDPMdensity representing the hierarchical DPM of normals model. Generic functions such as print, plot, and summary have methods to show the results of the fit. The results include sigma, eps, the vector of precision parameters alpha, mub and sigmab.

The list state in the output object contains the current value of the parameters necessary to restart the analysis. If you want to specify different starting values to run multiple chains set status=TRUE and create the list state based on this starting values. In this case the list state must include the following objects:

ncluster

an integer giving the number of clusters.

an interger vector defining to which of the ncluster clusters each subject belongs.

an integer vector defining to which DP each cluster belongs. Note that length(sc)=nrec (only the first ncluster elements are considered to start the chain.

alpha

giving the vector of dimension nsuties+1 of precision parameters.

muclus

a matrix of dimension (number of subject + 100) times the number of variables, giving the means for each cluster (only the first ncluster rows are considered to start the chain).

mub

giving the mean of the normal baseline distributions.

sigmab

giving the covariance matrix the normal baseline distributions.

sigma

giving the normal kernel covariance matrix.

eps

giving the value of eps.

Details

This generic function fits a hierarchical mixture of DPM of normals model for density estimation (Mueller, Quintana and Rosner, 2004): $$y_{ij} | F_i \sim F_i$$ where, $y_{ij}$ denote the j-th observation in the i-th study, i=1,...,I, and $F_i$ is assumed to arise as a mixture $F_i = \epsilon H_0 + (1-\epsilon) H_i$ of one common distribution $H_0$ and a distribution $H_i$ that is specific or idiosyncratic to the i-th study.

The random probability measures $H_i$ in turn are given a Dirichlet process mixture of normal prior. We assume $$ H_i(y) = \int N(\mu,\Sigma) d G_i(\mu),~ i=0,1,\ldots,I $$ with $$G_i | \alpha_i, G_0 \sim DP(\alpha G_0)$$ where, the baseline distribution is $$G_0 = N(\mu| \mu_b,\Sigma_b)$$.

To complete the model specification, independent hyperpriors are assumed (optional), $$\Sigma | \nu, T \sim IW(\nu,T)$$ $$\alpha_i | a_{0i}, b_{0i} \sim Gamma(a_{0i},b_{0i})$$ $$\mu_b | m_0, S_0 \sim N(m_0,S_0)$$ $$\Sigma_b | \nu_b, Tb \sim IW(\nu_b,Tb)$$ and $$ p(\epsilon) = \pi_0 \delta_0+ \pi_1 \delta_1+(1- \pi_0 -\ pi_1) Be(a_\epsilon,b_\epsilon)$$

Note that the inverted-Wishart prior is parametrized such that if $A \sim IW_q(\nu, \psi)$ then $E(A)= \psi^{-1}/(\nu-q-1)$.

References

Mueller, P., Quintana, F. and Rosner, G. (2004). A Method for Combining Inference over Related Nonparametric Bayesian Models. Journal of the Royal Statistical Society, Series B, 66: 735-749.

Examples

Run this code

# NOT RUN {
    # Data
      data(calgb)
      attach(calgb)
      y <- cbind(Z1,Z2,Z3,T1,T2,B0,B1)

    # Prior information
      prior <- list(pe1=0.1,
                    pe0=0.1,
                    ae=1,
                    be=1,
                    a0=rep(1,3),
                    b0=rep(1,3),
                    nu=9,
                    tinv=0.25*var(y),
 		    m0=apply(y,2,mean),
                    S0=var(y),
 		    nub=9,
                    tbinv=var(y))		


    # Initial state
      state <- NULL

    # MCMC parameters

      mcmc <- list(nburn=5000,
                   nsave=5000,
                   nskip=3,
                   ndisplay=100)

    # Fitting the model
      fit1 <- HDPMdensity(y=y,
                          study=study,
                          prior=prior,
                          mcmc=mcmc,
                          state=state,
                          status=TRUE)

    # Posterior inference
      fit1
      summary(fit1)
       
    # Plot the parameters
    # (to see the plots gradually set ask=TRUE)
      plot(fit1,ask=FALSE)

    # Plot the a specific parameters 
    # (to see the plots gradually set ask=TRUE)
      plot(fit1,ask=FALSE,param="eps",nfigr=1,nfigc=2)

   # Plot the measure for each study
      predict(fit1,i=1,r=1) # study 1
      predict(fit1,i=2,r=1) # study 2

    # Plot the idiosyncratic measure for each study
      predict(fit1,i=1,r=0) # study 1
      predict(fit1,i=2,r=0) # study 2

    # Plot the common measure
      predict(fit1,i=0)
# }

Run the code above in your browser using DataLab