Performs parametric and non-parametric estimation and simulation of drifting semi-Markov processes. The definition of parametric and non-parametric model specifications is also possible. Furthermore, three different types of drifting semi-Markov models are considered. These models differ in the number of transition matrices and sojourn time distributions used for the computation of a number of semi-Markov kernels, which in turn characterize the drifting semi-Markov kernel.
For third parties wishing to contribute to the software, or to report issues or problems about the software, they can do so directly through the development github page of the package.
Automated tests are in place in order to aid the user with any false input made
and, furthermore, to ensure that the functions used return the expected output.
Moreover, through strict automated tests, it is made possible for the user to
properly define their own dsmm
objects and make use of them with the generic
functions of the package.
Maintainer: Ioannis Mavrogiannis mavrogiannis.ioa@gmail.com
Authors:
Vlad Stefan Barbu
Ioannis Mavrogiannis
Nicolas Vergne
Introduction
The difference between the Markov models and the semi-Markov models concerns the modelling of the sojourn time distributions. The Markov models (in discrete time) are modelled by a sojourn time following the Geometric distribution. The semi-Markov models are able to have a sojourn time distribution of arbitrary shape. The further difference with a drifting semi-Markov model, is that we have \(d + 1\) (arbitrary) sojourn time distributions and \(d + 1\) transition matrices (Model 1), where \(d\) is defined as the polynomial degree. Through them, we compute \(d + 1\) semi-Markov kernels. In this work, we also consider the possibility for obtaining these semi-Markov kernels with \(d + 1\) transition matrices and \(1\) sojourn time distribution (Model 2) or \(d + 1\) sojourn time distributions and \(1\) transition matrix (Model 3).
Definition
Drifting semi-Markov processes are particular non-homogeneous semi-Markov chains for which the drifting semi-Markov kernel \(q_{\frac{t}{n}}(u,v,l)\) is defined as the probability that, given at the instance \(t\) the previous state is \(u\), the next state state \(v\) will be reached with a sojourn time of \(l\): $$q_{\frac{t}{n}}(u,v,l) = P(J_{t}=v,X_{t}=l|J_{t-1}=u),$$ where \(n\) is the model size, defined as the length of the embedded Markov chain \((J_{t})_{t\in \{0,\dots,n\}}\) minus the last state, where \(J_{t}\) is the state at the instant \(t\) and \(X_{t}=S_{t}-S_{t-1}\) is the sojourn time of the state \(J_{t-1}\).
The drifting semi-Markov kernel \(q_{\frac{t}{n}}\) is a linear combination of the product of \(d + 1\) semi-Markov kernels \(q_{\frac{i}{d}}\), where every semi-Markov kernel is the product of a transition matrix \(p\) and a sojourn time distribution \(f\). We define the situation when both \(p\) and \(f\) are "drifting" between \(d + 1\) fixed points of the model as Model 1, and thus we will use the exponential \((1)\) as a way to refer to the drifting semi-Markov kernel \(q_{\frac{t}{n}}^{\ (1)}\) and corresponding semi-Markov kernels \(q_{\frac{i}{d}}^{\ (1)}\) in this case. For Model 2, we allow the transition matrix \(p\) to drift but not the sojourn time distributions \(f\), and for Model 3 we allow the sojourn time distributions \(f\) to drift but not the transition matrix \(p\). The exponential \((2)\) or \((3)\) will be used for signifying Model 2 or Model 3, respectively. In the general case an exponential will not be used.
Model 1
Both \(p\) and \(f\) are drifting in this case. Thus, the drifting semi-Markov kernel \(q_{\frac{t}{n}}^{\ (1)}\) is a linear combination of the product of \(d + 1\) semi-Markov kernels \(q_{\frac{i}{d}}^{\ (1)}\), which are given by: $$q_{\frac{i}{d}}^{\ (1)}(u,v,l)= {p_{\frac{i}{d}}(u,v)}{f_{\frac{i}{d}}(u,v,l)},$$ where for \(i = 0,\dots,d\) we have \(d + 1\) Markov transition matrices \(p_{\frac{i}{d}}(u,v)\) of the embedded Markov chain \((J_{t})_{t\in \{0,\dots,n\}}\), and \(d + 1\) sojourn time distributions \(f_{\frac{i}{d}}(u,v,l)\). Therefore, the drifting semi-Markov kernel is described as: $$q_{\frac{t}{n}}^{\ (1)}(u,v,l) = \sum_{i = 0}^{d}A_{i}(t)\ q_{\frac{i}{d}}^{\ (1)}(u,v,l) = \sum_{i = 0}^{d}A_{i}(t)\ p_{\frac{i}{d}}(u,v)f_{\frac{i}{d}}(u,v,l),$$ where \(A_i, i = 0, \dots, d\) are \(d + 1\) polynomials with degree \(d\), which satisfy the conditions: $$\sum_{i=0}^{d}A_{i}(t) = 1,$$ $$A_i \left(\frac{nj}{d} \right)= 1_{\{i=j\}},$$ where the indicator function \(1_{\{i=j\}} = 1\), if \(i = j\), \(0\) otherwise.
Model 2
In this case, \(p\) is drifting and \(f\) is not drifting. Therefore, the drifting semi-Markov kernel is now described as: $$q_{\frac{t}{n}}^{\ (2)}(u,v,l) = \sum_{i = 0}^{d}A_{i}(t)\ q_{\frac{i}{d}}^{\ (2)}(u,v,l) = \sum_{i = 0}^{d}A_{i}(t)\ p_{\frac{i}{d}}(u,v)f(u,v,l).$$
Model 3
In this case, \(f\) is drifting and \(p\) is not drifting. Therefore, the drifting semi-Markov Kernel is now described as: $$q_{\frac{t}{n}}^{\ (3)}(u,v,l) = \sum_{i = 0}^{d}A_{i}(t)\ q_{\frac{i}{d}}^{\ (3)}(u,v,l) = \sum_{i = 0}^{d}A_{i}(t)\ p(u,v)f_{\frac{i}{d}}(u,v,l).$$
Parametric and non-parametric model specifications
In this package, we can define parametric and non-parametric drifting semi-Markov models.
For the parametric case, several discrete distributions are
considered for the modelling of the sojourn times:
Uniform, Geometric, Poisson, Discrete Weibull and Negative Binomial.
This is done from the function
parametric_dsmm
which returns an object of the
S3 class (dsmm_parametric
, dsmm
).
The non-parametric model specification concerns the sojourn
time distributions when no assumptions are done about the
shape of the distributions. This is done through the function called
nonparametric_dsmm()
, that returns an object of class
(dsmm_nonparametric
, dsmm
).
It is also possible to proceed with a parametric or non-parametric
estimation for a model on an existing sequence through the function
fit_dsmm()
, which returns an object with the S3 class
(dsmm_fit_parametric
, dsmm
) or
(dsmm_fit_nonparametric
, dsmm
) respectively, depending
on the given argument estimation = "parametric"
or
estimation = "nonparametric"
.
Therefore, the dsmm
class acts like a wrapper class
for drifting semi-Markov model specifications, while the classes
dsmm_fit_parametric
, dsmm_fit_nonparametric
,
dsmm_parametric
and dsmm_nonparametric
are exclusive to the functions that create the corresponding models,
and inherit methods from the dsmm
class.
In summary, based on an dsmm
object
it is possible to use the following methods:
Simulate a sequence through the function simulate.dsmm()
.
Get the drifting semi-Markov kernel
\(q_{\frac{t}{n}}(u,v,l)\), for any choice of \(u,v,l\) or \(t\),
through the function get_kernel()
.
Restrictions
The following restrictions must be satisfied for every drifting semi-Markov model:
The drifting semi-Markov kernel \(q_{\frac{t}{n}}(u,v,l)\), for every \(t \in \{ 0, \dots, n \}\) and \(u \in E\), has its sums over \(v\) and \(l\), equal to \(1\):
$$ \sum_{v \in E}\sum_{l = 1}^{+\infty}q_{\frac{t}{n}}(u,v,l) = \sum_{v \in E}\sum_{l = 1}^{+\infty}A_{i}(t)\ q_{\frac{i}{d}}(u,v,l) = 1.$$
Therefore, we also get that for every \(i \in \{0, \dots, d\}\) and \(u \in E\), the semi-Markov kernel \(q_{\frac{i}{d}}(u,v,l)\) has its sums over \(v\) and \(l\) equal to \(1\): $$\sum_{v \in E}\sum_{l = 1}^{+\infty}q_{\frac{i}{d}}(u,v,l) = 1.$$
Lastly, like in semi-Markov models, we do not allow sojourn times equal to \(0\) or passing into the same state: $$q_{\frac{t}{n}}(u,v,0) = 0, \forall u,v \in E,$$ $$q_{\frac{t}{n}}(u,u,l) = 0, \forall u\in E,l\in\{1,\dots,+\infty\}.$$
Model specification restrictions
When we define a drifting semi-Markov model specification through the
functions parametric_dsmm
or nonparametric_dsmm
,
the following restrictions need to be satisfied.
Model 1
The semi-Markov kernels are equal to \(q_{\frac{i}{d}}^{\ (1)}(u,v,l) = p_{\frac{i}{d}}(u,v)f_{\frac{i}{d}}(u,v,l)\). Therefore, \(\forall u \in E\) the sums of \(p_{\frac{i}{d}}(u,v)\) over \(v\) and the sums of \(f_{\frac{i}{d}}(u,v,l)\) over \(l\) must be equal to 1: $$\sum_{v \in E} p_{\frac{i}{d}}(u,v) = 1,$$ $$\sum_{l = 1}^{+\infty }f_{\frac{i}{d}}(u,v,l) = 1.$$
Model 2
The semi-Markov kernels are equal to \(q_{\frac{i}{d}}^{\ (2)}(u,v,l) = p_{\frac{i}{d}}(u,v)f(u,v,l)\). Therefore, \(\forall u \in E\) the sums of \(p_{\frac{i}{d}}(u,v)\) over \(v\) and the sums of \(f(u,v,l)\) over \(l\) must be equal to 1: $$\sum_{v \in E} p_{\frac{i}{d}}(u,v) = 1,$$ $$\sum_{l = 1}^{+\infty }f(u,v,l) = 1.$$
Model 3
The semi-Markov kernels are equal to \(q_{\frac{i}{d}}^{\ (3)}(u,v,l) = p(u,v)f_{\frac{i}{d}}(u,v,l)\). Therefore, \(\forall u \in E\) the sums of \(p(u,v)\) over \(v\) and the sums of \(f_{\frac{i}{d}}(u,v,l)\) over \(l\) must be equal to 1: $$\sum_{v \in E}p(u,v) = 1,$$ $$\sum_{l = 1}^{+\infty }f_{\frac{i}{d}}(u,v,l) = 1.$$
Barbu, V. S., Limnios, N. (2008). Semi-Markov Chains and Hidden Semi-Markov Models Toward Applications - Their Use in Reliability and DNA Analysis. New York: Lecture Notes in Statistics, vol. 191, Springer.
Vergne, N. (2008). Drifting Markov models with Polynomial Drift and Applications to DNA Sequences. Statistical Applications in Genetics Molecular Biology 7 (1).
Barbu V. S., Vergne, N. (2019). Reliability and survival analysis for drifting Markov models: modelling and estimation. Methodology and Computing in Applied Probability, 21(4), 1407-1429.
T. Nakagawa and S. Osaki. (1975). The discrete Weibull distribution. IEEE Transactions on Reliability, R-24, 300-301.
Sanger, F., Coulson, A. R., Hong, G. F., Hill, D. F., & Petersen, G. B. (1982). Nucleotide sequence of bacteriophage \(\lambda\) DNA. Journal of molecular biology, 162(4), 729-773.
For the estimation of a drifting semi-Markov model given a sequence: fit_dsmm.
For drifting semi-Markov model specifications: parametric_dsmm, nonparametric_dsmm.
For the simulation of sequences: simulate.dsmm, create_sequence.
For the retrieval of the drifting semi-Markov kernel through a
dsmm
object: get_kernel.