DPM.HODC: Hierarchical Ordered Density Clustering (HODC) for Dirichlet Process Mixture Model Fitting

Description

This function implements the HODC algorithm for posterior density samples by a Dirichlet process mixture (DPM) of normals model which is implemented by function "DPdensity" in R package DPpackages.

Usage

DPM.HODC(v, pvalue,
DPM.mcmc=list(nburn=2000,nsave=1,
nskip=0,ndisplay=10),
DPM.prior=list(a0=2,b0=1,m2=rep(0,1),
s2=diag(100000,1),
psiinv2=solve(diag(0.5,1)),
nu1=4,nu2=4,tau1=1,tau2=100))

Arguments

pvalue

a vector of p-values obtained from large scale statistical hypothesis testing

the number of posterior sample saved

DPM.mcmc

a list giving the MCMC a list giving the MCMC parameters for DPM fitting; see the argument mcmc of function DPdensity() in DPpackage for details; the default setting is DPM.mcmc=list(nburn=2000,nsave=1,nskip=0,ndisplay=10)

DPM.prior

a list giving the prior information; see the argument prior of function DPdensity() in DPpackage for details; the default setting is prior2

Value

a list of density clustering results by the HODC algorithm

mean

a list containing posterior samples for the mean of unimportant and important clusters

mu0: a vector of length "v" containing posterior samples for the mean of the unimportant cluster
mu1: a vector of length "v" containing posterior samples for the mean of the important cluster

variance

a list containing posterior samples of the variance of the unimportant and the important clusters

var0: a vector of length "v" containing posterior samples for the variance of the unimportant cluster
var1: a vector of length "v" containing posterior samples for the variance of the important cluster

probability

a list containing the probabilities of unimportant and important clusters

pro0: a vector of length "v" containing posterior samples for the probability of the unimportant cluster
pro1: a vector of length "v" containing posterior samples for the probability of the important cluster

classification

a binary (0/1) matrix of dimension "v" by length(pvalue) containing posterior samples for two cluster classification results

Details

This function calls DPdensity to estimate the marginal density of the testing statistics r, converted from p-values, using a mixture of normal densities without incorporating the network information. Furthermore, it implements the HODC algorithm to classify density components into two clusters. We refer to them as the unimportant cluster and the important cluster, where the important cluster has a larger mean than the unimportant cluster.

References

Yize Zhao, Jian Kang, Tianwei Yu (2014) A Bayesian nonparameteric model for selecting gene and gene sub-network, Annals of Applied Statistics, in press.

Zhou Lan, Jian Kang, Tianwei Yu, Yize Zhao, BANFF: an R package for network identifications via Bayesian nonparametric mixture models, working paper.

Examples

Run this code

###random make the density
rstat=c(rnorm(50,mean=1),rnorm(50,mean=2),rnorm(100,mean=4)
,rnorm(100,mean=8))
###transformed into pvalue
pvalue=pnorm(-rstat)
DPMHODC=DPM.HODC(v=5,pvalue)

Run the code above in your browser using DataLab