Networks.Fast: Bayesian Network Discovery using a Hybrid Fast Algorithm

Description

This function implements a hybrid fast algorithm to perform feature selection and sub-network discovery using a Bayesian nonparametric model based on Dirichlet process mixture models, finite mixture of normals model and the Ising model.

Usage

Networks.Fast(pvalue,net,iter=5000,nburns=2000,
algorithms=c("EM","DPM"),
v=20,DPM.mcmc=list(nburn=2000,nsave=1,nskip=0,ndisplay=10),
DPM.prior=list(a0=2,b0=1,m2=rep(0,1),s2=diag(100000,1),
psiinv2=solve(diag(0.5,1)),
nu1=4,nu2=4,tau1=1,tau2=100),
DPparallel=FALSE,n.cores=1,piall=c(0.8, 0.85, 0.9, 0.95),
rhoall=c(1, 2, 5, 10, 15),
show.steps=10,showlikelihood=FALSE, likelihood.frequency=100)

Arguments

pvalue

a vector of p-values obtained from large scale statistical hypothesis testing

net

an "n"" by "n" binary (0/1) adjacent matrix of network configurations, where n=length(pvalue)

iter

number of iterations; the default is 5000

nburns

number of burn-in; the default is 2000

algorithms

character taking value "EM" or "DPM" indicating the function to be used to obtain Finite Gaussian Mixture (FGM) estimates. It is recommended to choose "DPM" when the dimension of data is large, and to choose "EM" when the dimension is small.

number of iterations set for DPM fitting. v is only valid when you choose algorithms as "DPdensity"

DPM.mcmc

a list giving the MCMC a list giving the MCMC parameters for DPM fitting; see the argument mcmc of function DPdensity() in DPpackage for details; the default setting is DPM.mcmc=list(nburn=2000,nsave=1,nskip=0,ndisplay=10)

DPM.prior

a list giving the prior information; see the argument prior of function DPdensity() in DPpackage for details; the default setting is prior2

piall

a vector of possible choices for "pi0" in an increasing order; the default value is c(0.75, 0.8, 0.85, 0.9)

rhoall

a vector of possible choices of "rho0" and "rho1" in an increasing order; the default value is c(0.5, 1, 5, 10, 15)

DPparallel

the logic variable indicating whether apply parallel computing when you set algorithms="DPM"; the default setting is FALSE

n.cores

number of CUP cores for parallel computing, this argument is only valid when you set algorithms="DPM"; the default setting is 1

show.steps

integer representing the frequency of the results of iterations presented, the default setting is 10. The setting is invalid when trace=FALSE. The setting would not affect the data saved, only for printing

showlikelihood

a logical variable indicating whether to show the log-likelihood value simultaneously.Set TRUE if show the log-likelihood value simultaneously, FALSE, otherwise. FALSE is the default setting

likelihood.frequency

a number representing the frequency showing the log-likelihood value simultaneously. For example, setting likelihood.frequency=100 means showing the log-likelihood value every 100 iterations. The default setting is 100 and it is recommended that do not set a small frequency because it slow down the MCMC chain updating

Value

trace

an length(pvalue) by (iter-nburns) matrix

convergence

MCMC Heidelberger and Welch convergence diagnostic

graph

An igraph graph object of full network

statistics

a list of summary statistics characterizing the posterior distribution of "z_i"

mean: posterior mean for each feature
median: posterior median for each feature
var: posterior variance for each feature
quantile: posterior quantiles for each feature

Details

This function implements a Bayesian nonparametric mixture model for feature selection incorporating network information (Zhao et al., 2014):

r_i| g_i, theta ~ N(mu_g_i, sigma_g_i),
g_i | z_i=k, q_k ~ Discrete(a_k, q_k),
theta ~ G_0k, for g in a_k,
q_k ~ Dirichlet(tau_k 1_{L_k}/L_k),
theta={theta_g}_{g in a_0 and a_1}
theta_g=(mu_g, sigma_g)

where we define

Index: a_0=(-L_0+1,-L_0+2,...,0) , a_1=(1,2,...,L_1) and the correspondent probability q_0=(q_-L_0+1, q_-L_0+2, ...,q_0), q_1=(q_1, q_2, ..., q_L_1), according to the definition of Discrete(a_k, b_k), for example, Pr(g_i=L_0+2)=q_-L_0+2.

Assumption

In this algorithm, we assume that "important" features should have larger statics comparing to "unimportant" ones without the loss of generality. In this regard, we set the restriction mu_g

This function implements the NET-DPM-3 Zhao et al.(2014). Please refer to the Appendix B.3 for more details.

References

Zhao, Y., Kang, J., Yu, T. A Bayesian nonparametric mixture model for selecting gene and gene-sub network, Annals of Applied Statistics, In press: 2014.

Zhou Lan, Jian Kang, Tianwei Yu, Yize Zhao, BANFF: an R package for network identifications via Bayesian nonparametric mixture models, working paper.

Examples

Run this code

####Gene Network discovery
##Generating Scale free Gene Network
library(igraph)
g <- barabasi.game(50, power=1, zero.appeal=1.5,directed = FALSE)
net=as(get.adjacency(g,attr=NULL),"matrix")
##Random assign selected genes and make the signal intension as gaussian mixture
newz=rep(c(1,0,0,1,0),10)
Simnorm=function(n){
weight = c(0.4, 0.6)
mu = c(8,6)
sigma = c(1,0.5)
z = sample(c(1,2),size=n, prob=weight,replace=TRUE)
r = rnorm(n,mean=mu[z],sd=sigma[z])
return(r)
}
testcov<-0
for(i in 1:50){
 if(newz[i]==0){
   testcov[i]<-rnorm(1,mean=0,sd=1)
 }else{
  testcov[i]<-Simnorm(1)
 }
}
pvalue=pnorm(-testcov)
total1=Networks.Fast(pvalue,net,iter=5,nburns=2,
v=20,algorithms="DPM",DPparallel=FALSE,
piall=c(0.8, 0.85, 0.9, 0.95),rhoall=c(1, 2, 5, 10, 15)
)

Run the code above in your browser using DataLab