ARTIVAnet: Function to run the ARTIVA procedure for Auto Regressive TIme-VArying network inference on several target genes

Description

This function runs function ARTIVAsubnet for all target genes in targetData successively. This function generates Reversible Jump MCMC (RJ-MCMC) sampling for approximating the posterior distribution of a time varying regulatory network, under the Auto Regressive TIme VArying (ARTIVA) model (for a detailed description of the algorithm, see ARTIVAsubnet and see Lebre et al. BMC Systems Biology, 2010). A network representing the interactions between the factor genes and the target genes is estimated and plotted.

Usage

ARTIVAnet(targetData, parentData, targetNames = NULL, parentNames =
NULL, dataDescription=NULL, saveEstimations=TRUE, saveIterations=FALSE,
savePictures = TRUE, outputPath=NULL, dyn=1, segMinLength=2, maxCP=NULL,
maxPred=NULL, nbCPinit=NULL, CPinit=NULL, niter=50000, burn_in=NULL,
PSRFactor=FALSE,PSRF_thres=1.05, segmentAnalysis=TRUE,
edgesThreshold=0.5, layout = "fruchterman.reingold", cCP= 0.5,
cEdges=0.5, alphaCP=1, betaCP=0.5, alphaEdges=1, betaEdges=0.5, v0=1,
gamma0=0.1, alphad2=2, betad2=0.2, silent=FALSE)

Arguments

targetData

A vector with the temporal gene expression measurements for the target gene (i.e. the gene whose regulation factors are looked for).

parentData

A matrix (or a vector if only 1 parent gene) with the temporal gene expression measurements for the proposed parent genes (i.e. potential regulation factors). Parent genes are shown in row and expression values in column. For computational reasons, we advise not to test simultaneously more than 10 parent genes.

targetNames

A vector with the names for target gene(s) (optional, default: targetNames=NULL

parentNames

A vector with the names for parent gene(s) (optional, default: parentNames=NULL).

dataDescription

(Required only when the gene expression measurements contain repeated values for the same time points). A vector indicating the ordering of the time measurements in the data. For example dataDescription=rep(1:n, each=2), if there are two measurements for each time point AND the repetitions for each time point are next to each other. Note that temporal gene expression measurements have to be organized identically in arguments targetData and parentData (optional, default: dataDescription=NULL).

saveEstimations

Boolean, if TRUE all estimated posterior distributions are saved as text files either in a new sub folder named "ARTIVA_Results" created by default in the current folder or in a folder specified with argument outputPath (see below) (optional, default: saveEstimations=TRUE).

saveIterations

Boolean, if TRUE the configuration for all iterations is saved as text files either in a new sub folder named "ARTIVA_Results" created by default in the current folder or in a folder specified with argument outputPath (see below) (optional, default: saveIterations=FALSE).

savePictures

Boolean, if TRUE all estimated posterior distributions and networks are plotted in a pdf file either in a new sub folder named "ARTIVA_Results" created by default in the current folder or in a folder specified with argument outputPath (see below) (optional, default: savePictures=TRUE).

outputPath

File path to a folder in which the output results have to be saved, either a complete path or the name of a folder to be created in the current directory (optional, default: outputPath=NULL).

dyn

Time delay to be considered in the auto-regressive process, between the temporal expression measurements of the analyzed target gene and the ones of the parent genes (optional, default: dyn=1).

segMinLength

Minimal length (number of time points) to define a temporal segment. Must be - strictly - greater than 1 if there is no repeated measurements for each time point in arguments targetData and parentData (optional, default: segMinLength=2).

maxCP

Maximal number of CPs to be considered by the ARTIVA inference procedure. Note that for a temporal course with n time points the maximal number of CPs is (n-1-dyn) (see before for a description of argument dyn). For long temporal courses (more than 20 time points), we advise - for computational reasons - to limit the maximal number of CP to 15 (optional, default: maxCP=NULL, if maxCP=NULL then the maximal number of CPs is set to maxCP=min(round((n-1-dyn)/segMinLength)-1, 15) where n is the number of time points).

maxPred

Maximal number of simultaneous incoming edges for each segment of the regulatory network estimated for the target gene (default: maxPred=NULL, if maxPred=NULL then the maximal number of incoming edges is set to maxPred=min(dim(parentData)[1],15))

nbCPinit

Number of CPs to be considered at the algorithm initialization (optional, default: nbCPinit=NULL, if nbCPinit=NULL then the initial number of CPs is randomly set to a value between 0 and maxCP/2.).

CPinit

A vector with the initial positions for potential CPs. (optional, default: CPinit = NULL, if CPinit = NULL then the initial vector is chosen randomly according to priors, with nbCPinit changepoints (see previous argument nbCPinit) ).

niter

Number of iterations to be performed in the RJ-MCMC sampling (optional, default: niter = 50000).

burn_in

Number of initial iterations that are discarded for the estimation of the model distribution (posterior distribution). The ARTIVAnet function is a RJ-MCMC algorithm which, at each iteration, randomly samples a new configuration of the time-varying regulatory network from probability distributions based on constructing a Markov chain that has the network model distribution as its equilibrium distribution (The equilibrium distribution is obtained when the Markov Chain converges, which requires a large number of iterations). Typically, initial iterations are notconfident because the Markov Chain has not stabilized. The burn-in samples allow to not consider these initial iterations in the final analysis (optional, default: burn_in=NULL, if burn_in=NULL then the first 25% of the iterations is left for burn_in).

PSRFactor

Boolean, if TRUE the RJ-MCMC procedure is stopped when the Potential Scale Ratio Factor or PSRF (which is a usual convergence criterion) is below a specified threshold (see documentation for argument PSRF_thres below) or when the maximal number of iterations is reached (see documentation for argument niter previously) (optional, default: PSRFactor=FALSE). For more details about the PSRF criterion, see Gelman and Rubin (1992).

PSRF_thres

(Only when PSRFactor=TRUE) RJ-MCMC stopping threshold: the RJ-MCMC procedure is stopped when the Potential Scale Ratio Factor (PSRF) is below this threshold (see documentation for argument PSRFactor previously). Values of the PSRF below 1.1 are usually taken as indication of sufficient convergence (optional, default: PSRF_threshold=1.05).

segmentAnalysis

Boolean, if TRUE the posterior distribution for the edges in each temporal segment delimited by the estimated changepoints (CPs) is computed. (The estimated CPs are the k CPs with maximal posterior probability, where k is the number of CPs having the maximal porsterior probability (such that each temporal segment is larger than segMinLength) (see documentation below for output values CPpostDist, nbSegs, CPposition. If segmentAnalysis=FALSE only the posterior distributions of the CPs number and CPs positions are computed (optional, default: segmentAnalysis=TRUE).

edgesThreshold

Probability threshold for the selection of the edges of the time-varying regulatory network when segmentAnalysis=TRUE (optional, default: edgesThreshold=0.5).

layout

Name of the function determining the placement of the vertices for drawing a graph, possible values among others: "random", "circle", "sphere", "fruchterman.reingold", "kamada.kawai", "spring",

"reingold.tilford", "fruchterman.reingold.grid", see package igraph0 for more details (default: layout="fruchterman.reingold").

cCP

Maximal probability to propose the birth (resp. death) of a changepoint (CP) during the RJ-MCMC iterations (optional, default: cCP=0.5).

cEdges

Maximal probability - when a move update of the network topology is chosen - to propose ecah edge move (birth or death of an edge) within a temporal segment (optional, default: cEdges=0.5).

alphaCP

Hyperparameter for sampling the number k of CPs. k follows a Gamma distribution: Gamma(alphaCP, betaCP (see below)) (optional, default: alphaCP=1). Note that function choosePriors can be used to choose alphaCP (or betaCP) according to the desired dimension penalisation.

betaCP

Hyperparameter for sampling the number k of CPs. k follows a Gamma distribution: Gamma(alphaCP (see before), betaCP) (optional, default: betaCP=0.5). Note that function choosePriors can be used to choose betaCP (or alphaCP) according to the desired dimension penalisation.

alphaEdges

Hyperparameter for sampling the number l of regulatory interactions between the target gene and the parent genes. l follows a Gamma distribution: rgamma(1, shape=alphaEdges, rate=betaEdges) (see betaEdges below), (default: alphaEdges=1). Function choosePriors can be used to choose alphaEdges (or betaEdges) according to the desired dimension penalisation.

betaEdges

Hyperparameter for sampling the number l of regulatory interactions between the target gene and the parent genes. l follows a Gamma distribution: rgamma(1, shape=alphaEdges, rate=betaEdges) (see alphaEdges upper) (optional, default: betaEdges=0.5). Function choosePriors can be used to choose betaEdges (or alphaEdges) according to the desired dimension penalisation.

Hyperparameter for sampling the noise variance (denoted by Sig2) in the auto-regressive model defining the regulatory time vrarying network. The prior distribution of the noise variance is an Inverse Gamma distribution with shape parameter v0/2 and scale parameter gamma0/2: rinvgamma(1,shape=v0/2, scale = gamma0), (optional, default: v0=1).

gamma0

Hyperparameter for sampling the noise variance (denoted by Sig2) in the ARTIVA package) in the auto-regressive model defining the regulatory time vrarying network. The prior distribution of the noise variance is an Inverse Gamma distribution with shape parameter v0/2 and scale parameter gamma0/2: rinvgamma(1, shape=v0/2, scale = gamma0), (optional, default: gamma0=0.1).

alphad2

Hyperparameter for sampling a parameter that represents the expected signal-to-noise ratio (denoted by delta2 in the ARTIVA package). It is sampled according to an Inverse Gamma distribution: rinvgamma(1, shape=alphad2, scale=betad2), (optional, default: alphad2=2).

betad2

silent

Boolean, if TRUE messages are printed along the ARTIVA procedure (optional, default: silent=FALSE).

Value

A table containing the information to plot (see function traceNetworks) the global network estimated by the ARTIVAnet procedure. All results are plotted in a pdf file (when choosing

savePictures =
TRUE

) in folder outputPath. All numerical results (see ARTIVAsubnet output values documentation)) are saved in text files (when choosing saveEstimations=TRUE and/or saveIterations=TRUE) in folder outputPath.

References

Statistical inference of the time-varying structure of gene-regulation networks S. Lebre, J. Becq, F. Devaux, M. P. H. Stumpf, G. Lelandais, BMC Systems Biology 2010, 4:130.

Inference from iterative simulation using multiple sequences. Gelman, A. and D. Rubin, Statistical science 7 (4), 457-472, 1992.

Examples

Run this code

library(ARTIVA)

# Load the dataset with simulated gene expression profiles
data(simulatedProfiles)

# List of target genes to be analyzed independantly with ARTIVA 
targetGenes = c(1, 10, 20, "TF3", 45, 50)

# Names of the parent genes (typically transcription factors) 
parentGenes = c("TF1", "TF2", "TF3", "TF4", "TF5")

###
# ARTIVA analysis searching for potential interactions between each target 
# genes and a predefined list of parent genes. 
###

# Note that the number of iterations in the RJ-MCMC sampling is reduced 
# to 'niter=20000' in this example, but it should be increased (e.g. up to
# 50000) for a better estimation.

## Not run: 
# ARTIVAtest1=ARTIVAnet(targetData = simulatedProfiles[targetGenes,],
#   parentData = simulatedProfiles[parentGenes,], 
#   targetNames= targetGenes,
#   parentNames = parentGenes,
#   niter = 20000,
#   savePictures=FALSE)
# 
# # Note that function ARTIVAnet calls fonction ARTIVAsubnet for each 
# # target gene successively and provides a global estimated regulatory 
# # network entitled "ARTIVA_FinalNetwork.pdf" in addition to the output 
# # results given by function ARTIVAsubnet.
# 
# ## Gene names for the target and the parent genes, minimum segment length, 
# ## threshold for the edges selection can be specified as follow:
# ARTIVAtest2=ARTIVAnet(targetData = simulatedProfiles[targetGenes,],
#   parentData = simulatedProfiles[parentGenes,], 
#   targetNames= targetGenes,
#   parentNames = parentGenes,
#   segMinLength = 2,
#   edgesThreshold = 0.5, 
#   niter = 20000,
#   outputPath = "ARTIVA-test")
# ## End(Not run)

# By default, the output results (pictures and estimation values) are
# saved in a folder named "ARTIVAnet" created in the current directory
# In order to save the results in a specific folder, for example a
# folder entitled "ARTIVA-test" in the current directory:

Run the code above in your browser using DataLab