dsmmR-package: dsmmR : Estimation and Simulation of Drifting Semi-Markov Models

Description

Performs parametric and non-parametric estimation and simulation of drifting semi-Markov processes. The definition of parametric and non-parametric model specifications is also possible. Furthermore, three different types of drifting semi-Markov models are considered. These models differ in the number of transition matrices and sojourn time distributions used for the computation of a number of semi-Markov kernels, which in turn characterize the drifting semi-Markov kernel.

Arguments

Community Guidelines

For third parties wishing to contribute to the software, or to report issues or problems about the software, they can do so directly through the development github page of the package.

Notes

Automated tests are in place in order to aid the user with any false input made and, furthermore, to ensure that the functions used return the expected output. Moreover, through strict automated tests, it is made possible for the user to properly define their own dsmm objects and make use of them with the generic functions of the package.

Author

Maintainer: Ioannis Mavrogiannis mavrogiannis.ioa@gmail.com

Authors:

Vlad Stefan Barbu
Ioannis Mavrogiannis
Nicolas Vergne

Details

Introduction

The difference between the Markov models and the semi-Markov models concerns the modelling of the sojourn time distributions. The Markov models (in discrete time) are modelled by a sojourn time following the Geometric distribution. The semi-Markov models are able to have a sojourn time distribution of arbitrary shape. The further difference with a drifting semi-Markov model, is that we have $d + 1$ (arbitrary) sojourn time distributions and $d + 1$ transition matrices (Model 1), where $d$ is defined as the polynomial degree. Through them, we compute $d + 1$ semi-Markov kernels. In this work, we also consider the possibility for obtaining these semi-Markov kernels with $d + 1$ transition matrices and $1$ sojourn time distribution (Model 2) or $d + 1$ sojourn time distributions and $1$ transition matrix (Model 3).

Definition

Drifting semi-Markov processes are particular non-homogeneous semi-Markov chains for which the drifting semi-Markov kernel $q_{\frac{t}{n}}(u,v,l)$ is defined as the probability that, given at the instance $t$ the previous state is $u$, the next state state $v$ will be reached with a sojourn time of $l$: $$q_{\frac{t}{n}}(u,v,l) = P(J_{t}=v,X_{t}=l|J_{t-1}=u),$$ where $n$ is the model size, defined as the length of the embedded Markov chain $(J_{t})_{t\in \{0,\dots,n\}}$ minus the last state, where $J_{t}$ is the state at the instant $t$ and $X_{t}=S_{t}-S_{t-1}$ is the sojourn time of the state $J_{t-1}$.

The drifting semi-Markov kernel $q_{\frac{t}{n}}$ is a linear combination of the product of $d + 1$ semi-Markov kernels $q_{\frac{i}{d}}$, where every semi-Markov kernel is the product of a transition matrix $p$ and a sojourn time distribution $f$. We define the situation when both $p$ and $f$ are "drifting" between $d + 1$ fixed points of the model as Model 1, and thus we will use the exponential $(1)$ as a way to refer to the drifting semi-Markov kernel $q_{\frac{t}{n}}^{\ (1)}$ and corresponding semi-Markov kernels $q_{\frac{i}{d}}^{\ (1)}$ in this case. For Model 2, we allow the transition matrix $p$ to drift but not the sojourn time distributions $f$, and for Model 3 we allow the sojourn time distributions $f$ to drift but not the transition matrix $p$. The exponential $(2)$ or $(3)$ will be used for signifying Model 2 or Model 3, respectively. In the general case an exponential will not be used.

Model 1

Both $p$ and $f$ are drifting in this case. Thus, the drifting semi-Markov kernel $q_{\frac{t}{n}}^{\ (1)}$ is a linear combination of the product of $d + 1$ semi-Markov kernels $q_{\frac{i}{d}}^{\ (1)}$, which are given by: $$q_{\frac{i}{d}}^{\ (1)}(u,v,l)= {p_{\frac{i}{d}}(u,v)}{f_{\frac{i}{d}}(u,v,l)},$$ where for $i = 0,\dots,d$ we have $d + 1$ Markov transition matrices $p_{\frac{i}{d}}(u,v)$ of the embedded Markov chain $(J_{t})_{t\in \{0,\dots,n\}}$, and $d + 1$ sojourn time distributions $f_{\frac{i}{d}}(u,v,l)$. Therefore, the drifting semi-Markov kernel is described as: $$q_{\frac{t}{n}}^{\ (1)}(u,v,l) = \sum_{i = 0}^{d}A_{i}(t)\ q_{\frac{i}{d}}^{\ (1)}(u,v,l) = \sum_{i = 0}^{d}A_{i}(t)\ p_{\frac{i}{d}}(u,v)f_{\frac{i}{d}}(u,v,l),$$ where $A_i, i = 0, \dots, d$ are $d + 1$ polynomials with degree $d$, which satisfy the conditions: $$\sum_{i=0}^{d}A_{i}(t) = 1,$$ $$A_i \left(\frac{nj}{d} \right)= 1_{\{i=j\}},$$ where the indicator function $1_{\{i=j\}} = 1$, if $i = j$, $0$ otherwise.

Model 2

In this case, $p$ is drifting and $f$ is not drifting. Therefore, the drifting semi-Markov kernel is now described as: $$q_{\frac{t}{n}}^{\ (2)}(u,v,l) = \sum_{i = 0}^{d}A_{i}(t)\ q_{\frac{i}{d}}^{\ (2)}(u,v,l) = \sum_{i = 0}^{d}A_{i}(t)\ p_{\frac{i}{d}}(u,v)f(u,v,l).$$

Model 3

In this case, $f$ is drifting and $p$ is not drifting. Therefore, the drifting semi-Markov Kernel is now described as: $$q_{\frac{t}{n}}^{\ (3)}(u,v,l) = \sum_{i = 0}^{d}A_{i}(t)\ q_{\frac{i}{d}}^{\ (3)}(u,v,l) = \sum_{i = 0}^{d}A_{i}(t)\ p(u,v)f_{\frac{i}{d}}(u,v,l).$$

Parametric and non-parametric model specifications

In this package, we can define parametric and non-parametric drifting semi-Markov models.

For the parametric case, several discrete distributions are considered for the modelling of the sojourn times: Uniform, Geometric, Poisson, Discrete Weibull and Negative Binomial. This is done from the function parametric_dsmm which returns an object of the S3 class (dsmm_parametric, dsmm).

The non-parametric model specification concerns the sojourn time distributions when no assumptions are done about the shape of the distributions. This is done through the function called nonparametric_dsmm(), that returns an object of class (dsmm_nonparametric, dsmm).

It is also possible to proceed with a parametric or non-parametric estimation for a model on an existing sequence through the function fit_dsmm(), which returns an object with the S3 class (dsmm_fit_parametric, dsmm) or (dsmm_fit_nonparametric, dsmm) respectively, depending on the given argument estimation = "parametric" or estimation = "nonparametric" .

Therefore, the dsmm class acts like a wrapper class for drifting semi-Markov model specifications, while the classes dsmm_fit_parametric, dsmm_fit_nonparametric, dsmm_parametric and dsmm_nonparametric are exclusive to the functions that create the corresponding models, and inherit methods from the dsmm class.

In summary, based on an dsmm object it is possible to use the following methods:

Simulate a sequence through the function simulate.dsmm().
Get the drifting semi-Markov kernel $q_{\frac{t}{n}}(u,v,l)$, for any choice of $u,v,l$ or $t$, through the function get_kernel().

Restrictions

The following restrictions must be satisfied for every drifting semi-Markov model:

The drifting semi-Markov kernel $q_{\frac{t}{n}}(u,v,l)$, for every $t \in \{ 0, \dots, n \}$ and $u \in E$, has its sums over $v$ and $l$, equal to $1$:

$$ \sum_{v \in E}\sum_{l = 1}^{+\infty}q_{\frac{t}{n}}(u,v,l) = \sum_{v \in E}\sum_{l = 1}^{+\infty}A_{i}(t)\ q_{\frac{i}{d}}(u,v,l) = 1.$$
Therefore, we also get that for every $i \in \{0, \dots, d\}$ and $u \in E$, the semi-Markov kernel $q_{\frac{i}{d}}(u,v,l)$ has its sums over $v$ and $l$ equal to $1$: $$\sum_{v \in E}\sum_{l = 1}^{+\infty}q_{\frac{i}{d}}(u,v,l) = 1.$$
Lastly, like in semi-Markov models, we do not allow sojourn times equal to $0$ or passing into the same state: $$q_{\frac{t}{n}}(u,v,0) = 0, \forall u,v \in E,$$ $$q_{\frac{t}{n}}(u,u,l) = 0, \forall u\in E,l\in\{1,\dots,+\infty\}.$$

Model specification restrictions

When we define a drifting semi-Markov model specification through the functions parametric_dsmm or nonparametric_dsmm, the following restrictions need to be satisfied.

Model 1

The semi-Markov kernels are equal to $q_{\frac{i}{d}}^{\ (1)}(u,v,l) = p_{\frac{i}{d}}(u,v)f_{\frac{i}{d}}(u,v,l)$. Therefore, $\forall u \in E$ the sums of $p_{\frac{i}{d}}(u,v)$ over $v$ and the sums of $f_{\frac{i}{d}}(u,v,l)$ over $l$ must be equal to 1: $$\sum_{v \in E} p_{\frac{i}{d}}(u,v) = 1,$$ $$\sum_{l = 1}^{+\infty }f_{\frac{i}{d}}(u,v,l) = 1.$$

Model 2

The semi-Markov kernels are equal to $q_{\frac{i}{d}}^{\ (2)}(u,v,l) = p_{\frac{i}{d}}(u,v)f(u,v,l)$. Therefore, $\forall u \in E$ the sums of $p_{\frac{i}{d}}(u,v)$ over $v$ and the sums of $f(u,v,l)$ over $l$ must be equal to 1: $$\sum_{v \in E} p_{\frac{i}{d}}(u,v) = 1,$$ $$\sum_{l = 1}^{+\infty }f(u,v,l) = 1.$$

Model 3

The semi-Markov kernels are equal to $q_{\frac{i}{d}}^{\ (3)}(u,v,l) = p(u,v)f_{\frac{i}{d}}(u,v,l)$. Therefore, $\forall u \in E$ the sums of $p(u,v)$ over $v$ and the sums of $f_{\frac{i}{d}}(u,v,l)$ over $l$ must be equal to 1: $$\sum_{v \in E}p(u,v) = 1,$$ $$\sum_{l = 1}^{+\infty }f_{\frac{i}{d}}(u,v,l) = 1.$$

References

Barbu, V. S., Limnios, N. (2008). Semi-Markov Chains and Hidden Semi-Markov Models Toward Applications - Their Use in Reliability and DNA Analysis. New York: Lecture Notes in Statistics, vol. 191, Springer.

Vergne, N. (2008). Drifting Markov models with Polynomial Drift and Applications to DNA Sequences. Statistical Applications in Genetics Molecular Biology 7 (1).

Barbu V. S., Vergne, N. (2019). Reliability and survival analysis for drifting Markov models: modelling and estimation. Methodology and Computing in Applied Probability, 21(4), 1407-1429.

T. Nakagawa and S. Osaki. (1975). The discrete Weibull distribution. IEEE Transactions on Reliability, R-24, 300-301.

Sanger, F., Coulson, A. R., Hong, G. F., Hill, D. F., & Petersen, G. B. (1982). Nucleotide sequence of bacteriophage $\lambda$ DNA. Journal of molecular biology, 162(4), 729-773.