Learn R Programming

BANFF (version 1.1)

Networks.Fast: Bayesian Network Discovery using a Hybrid Fast Algorithm

Description

This function implements a hybrid fast algorithm to perform feature selection and sub-network discovery using a Bayesian nonparametric model based on Dirichlet process mixture models, finite mixture of normals model and the Ising model.

Usage

Networks.Fast(pvalue,net,iter=5000,nburns=2000, algorithms=c("EM","DPM"), v=20,DPM.mcmc=list(nburn=2000,nsave=1,nskip=0,ndisplay=10), DPM.prior=list(a0=2,b0=1,m2=rep(0,1),s2=diag(100000,1), psiinv2=solve(diag(0.5,1)), nu1=4,nu2=4,tau1=1,tau2=100), DPparallel=FALSE,n.cores=1,piall=c(0.8, 0.85, 0.9, 0.95), rhoall=c(1, 2, 5, 10, 15), show.steps=10,showlikelihood=FALSE, likelihood.frequency=100)

Arguments

pvalue
a vector of p-values obtained from large scale statistical hypothesis testing
net
an "n"" by "n" binary (0/1) adjacent matrix of network configurations, where n=length(pvalue)
iter
number of iterations; the default is 5000
nburns
number of burn-in; the default is 2000
algorithms
character taking value "EM" or "DPM" indicating the function to be used to obtain Finite Gaussian Mixture (FGM) estimates. It is recommended to choose "DPM" when the dimension of data is large, and to choose "EM" when the dimension is small.
v
number of iterations set for DPM fitting. v is only valid when you choose algorithms as "DPdensity"
DPM.mcmc
a list giving the MCMC a list giving the MCMC parameters for DPM fitting; see the argument mcmc of function DPdensity() in DPpackage for details; the default setting is DPM.mcmc=list(nburn=2000,nsave=1,nskip=0,ndisplay=10)
DPM.prior
a list giving the prior information; see the argument prior of function DPdensity() in DPpackage for details; the default setting is prior2
piall
a vector of possible choices for "pi0" in an increasing order; the default value is c(0.75, 0.8, 0.85, 0.9)
rhoall
a vector of possible choices of "rho0" and "rho1" in an increasing order; the default value is c(0.5, 1, 5, 10, 15)
DPparallel
the logic variable indicating whether apply parallel computing when you set algorithms="DPM"; the default setting is FALSE
n.cores
number of CUP cores for parallel computing, this argument is only valid when you set algorithms="DPM"; the default setting is 1
show.steps
integer representing the frequency of the results of iterations presented, the default setting is 10. The setting is invalid when trace=FALSE. The setting would not affect the data saved, only for printing
showlikelihood
a logical variable indicating whether to show the log-likelihood value simultaneously.Set TRUE if show the log-likelihood value simultaneously, FALSE, otherwise. FALSE is the default setting
likelihood.frequency
a number representing the frequency showing the log-likelihood value simultaneously. For example, setting likelihood.frequency=100 means showing the log-likelihood value every 100 iterations. The default setting is 100 and it is recommended that do not set a small frequency because it slow down the MCMC chain updating

Value

trace
an length(pvalue) by (iter-nburns) matrix
convergence
MCMC Heidelberger and Welch convergence diagnostic
graph
An igraph graph object of full network
statistics
a list of summary statistics characterizing the posterior distribution of "z_i"
mean
posterior mean for each feature
median
posterior median for each feature
var
posterior variance for each feature
quantile
posterior quantiles for each feature

Details

This function implements a Bayesian nonparametric mixture model for feature selection incorporating network information (Zhao et al., 2014):
  • r_i| g_i, theta ~ N(mu_g_i, sigma_g_i),
  • g_i | z_i=k, q_k ~ Discrete(a_k, q_k),
  • theta ~ G_0k, for g in a_k,
  • q_k ~ Dirichlet(tau_k 1_{L_k}/L_k),
  • theta={theta_g}_{g in a_0 and a_1}
  • theta_g=(mu_g, sigma_g)

where we define

Index
a_0=(-L_0+1,-L_0+2,...,0) , a_1=(1,2,...,L_1) and the correspondent probability q_0=(q_-L_0+1, q_-L_0+2, ...,q_0), q_1=(q_1, q_2, ..., q_L_1), according to the definition of Discrete(a_k, b_k), for example, Pr(g_i=L_0+2)=q_-L_0+2.

Assumption
In this algorithm, we assume that "important" features should have larger statics comparing to "unimportant" ones without the loss of generality. In this regard, we set the restriction mu_g

This function implements the NET-DPM-3 Zhao et al.(2014). Please refer to the Appendix B.3 for more details.

References

Zhao, Y., Kang, J., Yu, T. A Bayesian nonparametric mixture model for selecting gene and gene-sub network, Annals of Applied Statistics, In press: 2014.

Zhou Lan, Jian Kang, Tianwei Yu, Yize Zhao, BANFF: an R package for network identifications via Bayesian nonparametric mixture models, working paper.

Examples

Run this code
####Gene Network discovery
##Generating Scale free Gene Network
library(igraph)
g <- barabasi.game(50, power=1, zero.appeal=1.5,directed = FALSE)
net=as(get.adjacency(g,attr=NULL),"matrix")
##Random assign selected genes and make the signal intension as gaussian mixture
newz=rep(c(1,0,0,1,0),10)
Simnorm=function(n){
weight = c(0.4, 0.6)
mu = c(8,6)
sigma = c(1,0.5)
z = sample(c(1,2),size=n, prob=weight,replace=TRUE)
r = rnorm(n,mean=mu[z],sd=sigma[z])
return(r)
}
testcov<-0
for(i in 1:50){
 if(newz[i]==0){
   testcov[i]<-rnorm(1,mean=0,sd=1)
 }else{
  testcov[i]<-Simnorm(1)
 }
}
pvalue=pnorm(-testcov)
total1=Networks.Fast(pvalue,net,iter=5,nburns=2,
v=20,algorithms="DPM",DPparallel=FALSE,
piall=c(0.8, 0.85, 0.9, 0.95),rhoall=c(1, 2, 5, 10, 15)
)

Run the code above in your browser using DataLab