Learn R Programming

BANFF (version 1.1)

Networks.STD: Bayesian Network Discovery using a Standard MCMC Algorithm

Description

This function implements the standard Markov chain Monte Carlo (MCMC) algorithm to perform feature selection and sub-network discovery using a Bayesian nonparametric model based on the Dirichlet process mixture (DPM) model and the Ising model.

Usage

Networks.STD(pvalue,net,iter=5000, nburns=2000, piall=c(0.75, 0.8, 0.85, 0.9),rhoall=c(0.5, 1, 5, 10, 15), status=FALSE,fit,show.steps=1,showlikelihood=FALSE,likelihood.frequency=100)

Arguments

pvalue
a vector of p-values obtained from large scale statistical hypothesis testing
net
an "n" by "n" binary adjacent matrix (0/1) for the network configuration with n=length(pvalue)
iter
the number of iterations; the default is 5000
nburns
the number of burn-in; the default is 2000
piall
a vector of possible choices for "pi0" in an increasing order; the default value is c(0.75, 0.8, 0.85, 0.9)
rhoall
a vector of possible choices of "rho0" and "rho1" in an increasing order; the default value is c(0.5, 1, 5, 10, 15)
status
a logical variable indicating whether the you are initial an MCMC chain or resume a chain
fit
a list provide the current values of parameters which are used for resume the chain
show.steps
integer representing the frequency of the results of iterations presented, the default value is 1.The setting is invalid when trace=FALSE. The setting would not affect the data saved, only for printing
showlikelihood
a logical variable indicating whether to show the log-likelihood value simultaneously.Set TRUE if show the log-likelihood value simultaneously, FALSE, otherwise. FALSE is the default setting
likelihood.frequency
a number representing the frequency showing the log-likelihood value simultaneously. For example, setting likelihood.frequency=100 means showing the log-likelihood value every 100 iterations. The default setting is 100 and it is recommended that do not set a small frequency because it slow down the MCMC chain updating

Value

An object of class "Networks.STD" containing Bayesian information of the MCMC chain
trace
a matrix of dimension (iter-nburns) by length(pvalue) containing posterior samples of classification label "g_i"
convergence
MCMC Heidelberger and Welch convergence diagnostic
graph
An igraph graph object of full network
model
a list containing parameters of the MCMC chain for resuming the chain

Details

This function implements a Bayesian nonparametric mixture model for feature selection incorporating network information (Zhao et al., 2014):
  • r_i| g_i, theta ~ N(mu_g_i, sigma_g_i),
  • g_i | z_i=k, q_k ~ Discrete(a_k, q_k),
  • theta ~ G_0k, for g in a_k,
  • q_k ~ Dirichlet(tau_k 1_{L_k}/L_k),
  • theta={theta_g}_{g in a_0 and a_1}
  • theta_g=(mu_g, sigma_g)

where we define

Index
a_0=(-L_0+1,-L_0+2,...,0) , a_1=(1,2,...,L_1) and the correspondent probability q_0=(q_-L_0+1, q_-L_0+2, ...,q_0), q_1=(q_1, q_2, ..., q_L_1), according to the definition of Discrete(a_k, b_k), for example, Pr(g_i=L_0+2)=q_-L_0+2.

Assumption
In this algorithm, we assume that "important" features should have larger statistics comparing to "unimportant" ones without the loss of generality. In this regard, we set the restriction mu_g

This function implements the NET-DPM-1 Zhao et al.(2014). Please refer to the Appendix B.1 for more details.

References

Zhao, Y., Kang, J., Yu, T. A Bayesian nonparametric mixture model for selecting gene and gene-sub network, Annals of Applied Statistics, In press: 2014.

Zhou Lan, Jian Kang, Tianwei Yu, Yize Zhao, BANFF: an R package for network identifications via Bayesian nonparametric mixture models, working paper.

Examples

Run this code
library(igraph)
###Creating the network of 10X10 image
g <- graph.lattice(length=10,dim=2)
net=as(get.adjacency(g,attr=NULL),"matrix")##this is the input of argument 'net'
##Assign the signal elements with signal intention 
##as normal distribution N(1,0.2). While noise is set as N(0,0.2)
newz=rep(0,100)
for (i in 3:7)
{
 newz[(i*10+3):(i*10+7)]=1
}
testcov<-0
for(i in 1:100){
 if(newz[i]==0){
   testcov[i]<-rnorm(1,mean=0,sd=0.2)

 }else{
  testcov[i]<-rnorm(1,mean=1,sd=0.2)

 }
}
##The profile of the image
image(matrix(testcov,10,10),col=gray(seq(0,1,length=255)))
##Transform the signals into pvalue form and begin identification
pvalue=pnorm(-testcov)
total=Networks.STD(pvalue,net,iter=3,nburns=1,
piall=c(0.8, 0.85, 0.9, 0.95),rhoall=c(0.5,1,5,10,15))

Run the code above in your browser using DataLab