Starting from time-course gene expression measurements for a gene of interest (referred to as "target gene") and a set of genes (referred to as "parent genes") which may explain the expression of the target gene, the ARTIVA procedure identifies temporal segments for which a set of interactions occur between the "parent genes" and the "target gene". The time points that delimit the different temporal segments are referred to as changepoints (CP).
If the measurements time delay is short enough so that the expression of
the target gene depends more likely on the expression of the parent
genes at some previous time points, then the time delay for the
interactions can be chosen with argument dyn
. In that case the set of
parent genes may contain the target gene (auto-regulation). Otherwise,
contemporaneous measurements of the parent genes are used to explain the
expression of the target gene and argument dyn
is set to 0. In the
latter case, the target gene must be kept out of the set of possible
parent genes.
The ARTIVA algorithm uses a combination of efficient and robust methods: (1) dynamical Bayesian networks (DBN) to model directed regulatory interactions between the parent genes and the analyzed target gene (2) RJ-MCMC sampling for inferring - simultaneously - (a) the time position (changepoints) at which the regulatory interactions between the parent genes and the target gene changes and (b) the regulatory network topologies (interactions between parent and target genes) associated with the temporal segments delimited by the changepoints.
If available, repeated measurements can be used, the design of
experiments must be specified with argument dataDescription
.
ARTIVAsubnet(targetData, parentData, targetName="Target",parentNames=NULL,
dataDescription=NULL, saveEstimations=TRUE, saveIterations=FALSE,
savePictures = TRUE, outputPath=NULL, dyn=1, segMinLength=2, maxCP=NULL,
maxPred=NULL, nbCPinit=NULL, CPinit=NULL, niter=50000, burn_in=NULL,
PSRFactor=FALSE, PSRF_thres=1.05, segmentAnalysis=TRUE, edgesThreshold=0.5,
layout = "fruchterman.reingold", cCP= 0.5, cEdges=0.5, alphaCP=1,
betaCP=0.5, alphaEdges=1, betaEdges=0.5, v0=1, gamma0=0.1, alphad2=2,
betad2=0.2, silent=FALSE)
parentNames=NULL
).
targetData
and
parentData
(optional, default: dataDescription=NULL
).
TRUE
all estimated posterior distributions are saved
as text files either in a new sub folder named "ARTIVA_Results" created
by default in the current folder or in a folder specified with argument
outputPath
(see below) (optional, default:
saveEstimations=TRUE
).
TRUE
the configuration for all iterations is saved as
text files either in a new sub folder named "ARTIVA_Results" created
by default in the current folder or in a folder specified with argument
outputPath
(see below) (optional, default:
saveIterations=FALSE
).
TRUE
all estimated posterior distributions and
networks are plotted in a pdf file either in a new sub folder named
"ARTIVA_Results" created by default in the current folder or in a
folder specified with argument outputPath
(see below)
(optional, default: savePictures=TRUE
).
outputPath=NULL
).
dyn=1
).
targetData
and parentData
(optional, default: segMinLength=2
).
n
time points
the maximal number of CPs is (n-1-dyn)
(see before for a
description of argument dyn
). For long temporal courses (more
than 20 time points), we advise - for computational reasons - to limit
the maximal number of CP to 15 (optional, default: maxCP=NULL
,
if maxCP=NULL
then the maximal number of CPs is set to
maxCP=min(round((n-1-dyn)/segMinLength)-1, 15)
where n
is the
number of time points).
maxPred=NULL
, if maxPred=NULL
then the maximal number of
incoming edges is set to
maxPred=min(dim(parentData)[1],15)
)
nbCPinit=NULL
, if nbCPinit=NULL
then the
initial number of CPs is randomly set to a value between 0 and maxCP/2
.).
CPinit = NULL
, if CPinit = NULL
then the initial vector is chosen randomly according to priors, with
nbCPinit
changepoints (see previous argument nbCPinit
) ).
niter = 50000
).
ARTIVAsubnet
function is a RJ-MCMC algorithm which, at
each iteration, randomly samples a new configuration of the
time-varying regulatory network from probability distributions based
on constructing a Markov chain that has the network model distribution
as its equilibrium distribution (The equilibrium distribution is
obtained when the Markov Chain converges, which requires a large
number of iterations).
Typically, initial iterations are notconfident because the Markov Chain
has not stabilized. The burn-in samples allow to not consider these
initial iterations in the final analysis (optional, default:
burn_in=NULL
, if burn_in=NULL
then the first 25% of the
iterations is left for burn_in
).
TRUE
the RJ-MCMC procedure is stopped when the Potential
Scale Ratio Factor or PSRF (which is a usual convergence criterion) is
below a specified threshold (see documentation for argument
PSRF_thres
below) or when the maximal number of iterations is
reached (see documentation for argument niter
previously) (optional, default: PSRFactor=FALSE
).
For more details about the PSRF criterion, see Gelman and Rubin (1992).
PSRFactor=TRUE
) RJ-MCMC stopping threshold: the
RJ-MCMC procedure is stopped when the Potential Scale Ratio Factor
(PSRF) is below this threshold (see documentation for argument
PSRFactor
previously). Values of the PSRF below 1.1 are usually
taken as indication of sufficient convergence (optional, default:
PSRF_threshold=1.05
).
TRUE
the posterior distribution for the edges in each
temporal segment delimited by the estimated changepoints (CPs) is
computed. (The estimated CPs are the k
CPs with maximal posterior
probability, where k
is the number of CPs having the maximal
porsterior probability (such that each temporal segment is larger than
segMinLength
) (see documentation below for output values
CPpostDist
, nbSegs
, CPposition
. If segmentAnalysis=FALSE
only the posterior distributions of the CPs number and CPs positions are computed (optional, default: segmentAnalysis=TRUE
).
segmentAnalysis=TRUE
(optional, default: edgesThreshold=0.5
).
"random",
"circle",
"sphere",
"fruchterman.reingold",
"kamada.kawai",
"spring",
"reingold.tilford",
"fruchterman.reingold.grid"
,
see package igraph0
for more details (default:
layout="fruchterman.reingold"
).
cCP=0.5
).
cEdges=0.5
).
k
of CPs. k
follows
a Gamma distribution: Gamma(alphaCP, betaCP (see below)) (optional,
default: alphaCP=1
). Note that function
choosePriors
can be used to choose alphaCP
(or betaCP
) according to the desired dimension penalisation.
k
of CPs. k
follows
a Gamma distribution: Gamma(alphaCP (see before), betaCP
)
(optional, default: betaCP=0.5
). Note that function
choosePriors
can be used to choose betaCP
(or
alphaCP
) according to the desired dimension penalisation.
l
of regulatory
interactions between the target gene and the parent genes. l
follows a Gamma distribution: rgamma(1,
shape=alphaEdges, rate=betaEdges)
(see betaEdges
below), (default:
alphaEdges=1
). Function choosePriors
can be used to
choose alphaEdges
(or betaEdges
) according to
the desired dimension penalisation.
l
of regulatory
interactions between the target gene and the parent genes. l
follows a Gamma
distribution: rgamma(1,
shape=alphaEdges, rate=betaEdges)
(see alphaEdges
upper) (optional, default:
betaEdges=0.5
). Function choosePriors
can be used
to choose betaEdges
(or alphaEdges
) according to the desired
dimension penalisation.
Sig2
)
in the auto-regressive model defining the regulatory time vrarying network. The prior
distribution of the noise variance is an Inverse Gamma distribution with
shape parameter v0/2
and scale parameter gamma0/2
: rinvgamma(1,shape=v0/2,
scale = gamma0)
, (optional, default: v0=1
).
Sig2
) in the ARTIVA package) in the auto-regressive
model defining the regulatory time vrarying network. The prior
distribution of the noise variance is an Inverse Gamma distribution with
shape parameter v0/2
and scale parameter gamma0/2
: rinvgamma(1,
shape=v0/2, scale = gamma0)
, (optional, default: gamma0=0.1
).
delta2
in the ARTIVA
package). It is sampled according to an Inverse Gamma distribution:
rinvgamma(1, shape=alphad2,
scale=betad2)
, (optional, default:
alphad2=2
).
delta2
in the ARTIVA
package). It is sampled according to an Inverse Gamma distribution:
rinvgamma(1, shape=alphad2,
scale=betad2)
, (optional, default: betad2=0.2
).
TRUE
messages are printed along the ARTIVA procedure (optional, default: silent=FALSE
).
Samples
is list composed of the following elements:1) Samples$CP
: a matrix with in row the different iterations
performed with the ARTIVAsubnet
function and in column the identified positions for CPs (if the parameter dyn=1, first CP=2 and final CP=n+1, with n the number of time points);2) Samples$Edges
: a matrix with in row the different ARTIVA iterations and in column the
number of regulatory interactions identified in each temporal phases (-1 values are default
values, when no temporal phase exist);3) Samples$coeff
: a matrix with in row the different ARTIVA iterations and in column the
coefficient values corresponding to the identified regulatory interactions.4) Samples$variance
: a matrix with in row the different ARTIVA iterations and in column the
variance values modelling data noise for each identified temporal phase.
Counters
is list composed of the following elements:1) Counters$CPMovesCount
: Number of modifications PROPOSED during ARTIVA iterations in term
of CPs (i.e. birth of a new CP, death of an existing CP, move of an existing
CP or upate of regulatory models in each temporal phases).2) Counters$CPMovesAcceptationPrct
: Percentage of modifications ACCEPTED during ARTIVA iterations in term
of CPs (i.e. birth of a new CP, death of an existing CP, move of an existing
CP or upate of regulatory models in each temporal phases).3) Counters$EdgesMoveCount
: Number of modifications PROPOSED during
ARTIVA iterations in term of regulatory models (i.e. birth of a new edge between parent and target genes, death of an
existing edge or update of the regression coefficient for existing edges).4) Counters$EdgesMovesAcceptationPrct
: Percentage of modifications ACCEPTED
during ARTIVA iterations in term of regulatory models (i.e. birth of a
new edge between parent and target genes, death of an existing edge or update of the regression coefficient for existing edges).5) Counters$iterations
: Total number of iterations generated by the
ARTIVA procedure.6) Counters$rcvgce
: (only if PSRFactor=TRUE
) Number of iterations before
the stopping criterion is reached, i.e. before the PSRF factor is below
the threshold specified with argument PSRF_thres
. The returned value
is NULL if the stopping criterion is not reached.CPpostDist$CPnumberPostDist
: A table containing the approximated distribution for the number of CPs.2)CPpostDist$CPpositionPostDist
: A table containing the approximated distribution for the position of the CPs.
CPnumberPostDist
).
segmentAnalysis=TRUE
) A list of tables:1) SegmentPostDist$CPpos
: A table containing the most
significant CP positions that delimit nbSegs
temporal segments,
according to CPpostDist$CPnumber
, CPpostDist$CPposition
and
segMinLength
(if parameter dyn=1
, first CP is 2
and final CP is n+1
, where n
is the number of time
points).2) SegmentPostDist$edgesPostDist
: A table containing the approximate posterior distribution for the incoming edges (regulatory
interaction between parent and target genes) for each temporal segment
delimited by the CP given in SegmentPostDist$CPpos
(see previously). Each raw corresponds to a segment, ordered by time.3) SegmentPostDist$edgesCoeff
A table containing the estimated
coefficients for the incoming edges (regulatory interaction between
parent and target genes) for each temporal segment delimited by the CP given in SegmentPostDist$CPpos
(see previously). Each raw corresponds to a segment, ordered by time.
traceNetworks
) the network estimated with the ARTIVAsubnet
procedure.
ARTIVAsubnet
procedure
ARTIVAsubnet
procedure
ARTIVAsubnet
procedure.targetData
vector given as input of the ARTIVAsubnet
procedure.
parentData
matrix given as input of the ARTIVAsubnet
procedure.
Gelman, A. and D. Rubin (1992) Inference from iterative simulation using multiple sequences. Statistical science 7 (4), 457-472.
ARTIVAnet
,ARTIVAsubnetAnalysis
,
choosePriors
, CP.postDist
, plotCP.postDist
,# Load the ARTIVA R package
library(ARTIVA)
# Load the dataset with simulated gene expression profiles
data(simulatedProfiles)
# Name of the target gene to be analyzed with ARTIVA
targetGene = 1
# Names of the parent genes (typically transcription factors)
parentGenes = c("TF1", "TF2", "TF3", "TF4", "TF5")
###
# ARTIVA analysis searching for potential interactions between the target
# genes and a predefined list of parent genes.
###
# Note that the number of iterations in the RJ-MCMC sampling is reduced
# to in this example to 'niter=20000' (in order obtain a quick overview
# of the ARTIVAnet fonction, but it should be increased (e.g. up to
# 50000) for a better parameter estimation.
# Without saving the output pictures "savePictures=FALSE"
## Not run:
# ARTIVAtest = ARTIVAsubnet(targetData = simulatedProfiles[targetGene,],
# parentData = simulatedProfiles[parentGenes,],
# targetName = targetGene,
# parentNames = parentGenes,
# segMinLength = 2,
# edgesThreshold = 0.5,
# niter = 20000,
# savePictures=FALSE)
#
# # By default, the output results (pictures and estimation values) are
# # saved in a folder named "ARTIVAsubnet" created in the current directory
# ARTIVAtest = ARTIVAsubnet(targetData = simulatedProfiles[targetGene,],
# parentData = simulatedProfiles[parentGenes,],
# targetName = targetGene,
# parentNames = parentGenes,
# segMinLength = 2,
# edgesThreshold = 0.5,
# niter = 20000)
#
# # In order to save the results in a specific folder, for example a
# # folder entitled "ARTIVA-test" in the current directory:
#
# ARTIVAtest2 = ARTIVAsubnet(targetData = simulatedProfiles[targetGene,],
# parentData = simulatedProfiles[parentGenes,],
# targetName = targetGene,
# parentNames = parentGenes,
# segMinLength = 2,
# edgesThreshold = 0.5,
# niter = 20000,
# outputPath = "ARTIVA-test")
# ## End(Not run)
Run the code above in your browser using DataLab