Learn R Programming

dsmmR (version 1.0.7)

parametric_dsmm: Parametric Drifting semi-Markov model specification

Description

Creates a parametric model specification for a drifting semi-Markov model. Returns an object of class (dsmm_parametric, dsmm).

Usage

parametric_dsmm(
  model_size,
  states,
  initial_dist,
  degree,
  f_is_drifting,
  p_is_drifting,
  p_dist,
  f_dist,
  f_dist_pars
)

Value

Returns an object of the S3 class dsmm_parametric, dsmm. It has the following attributes:

  • dist : List. Contains 3 arrays, passing down from the arguments:

    • p_drift or p_notdrift, corresponding to whether the defined \(p\) transition matrix is drifting or not.

    • f_drift_parametric or f_notdrift_parametric, corresponding to whether the defined \(f\) sojourn time distribution is drifting or not.

    • f_drift_parameters or f_notdrift_parameters, which are the defined \(f\) sojourn time distribution parameters, depending on whether \(f\) is drifting or not.

  • initial_dist : Numerical vector. Passing down from the arguments. It contains the initial distribution of the drifting semi-Markov model.

  • states : Character vector. Passing down from the arguments. It contains the state space \(E\).

  • s : Positive integer. It contains the number of states in the state space, \(s = |E|\), which is given in the attribute states.

  • degree : Positive integer. Passing down from the arguments. It contains the polynomial degree \(d\) considered for the drifting of the model.

  • model_size : Positive integer. Passing down from the arguments. It contains the size of the drifting semi-Markov model \(n\), which represents the length of the embedded Markov chain \((J_{t})_{t\in \{0,\dots,n\}}\), without the last state.

  • f_is_drifting : Logical. Passing down from the arguments. Specifies if \(f\) is drifting or not.

  • p_is_drifting : Logical. Passing down from the arguments. Specifies if \(p\) is drifting or not.

  • Model : Character. Possible values:

    • "Model_1" : Both \(p\) and \(f\) are drifting.

    • "Model_2" : \(p\) is drifting and \(f\) is not drifting.

    • "Model_3" : \(f\) is drifting and \(p\) is not drifting.

  • A_i : Numerical matrix. Represents the polynomials \(A_i(t)\) with degree \(d\) that are used for solving the system \(MJ = P\). Used for the methods defined for the object. Not printed when viewing the object.

Arguments

model_size

Positive integer that represents the size of the drifting semi-Markov model \(n\). It is equal to the length of a theoretical embedded Markov chain \((J_{t})_{t\in \{0,\dots,n\}}\), without the last state.

states

Character vector that represents the state space \(E\). It has length equal to \(s = |E|\).

initial_dist

Numerical vector of \(s\) probabilities, that represents the initial distribution for each state in the state space \(E\).

degree

Positive integer that represents the polynomial degree \(d\) for the drifting semi-Markov model.

f_is_drifting

Logical. Specifies if \(f\) is drifting or not.

p_is_drifting

Logical. Specifies if \(p\) is drifting or not.

p_dist

Numerical array, that represents the probabilities of the transition matrix \(p\) of the embedded Markov chain \((J_{t})_{t\in \{0,\dots,n\}}\) (it is defined the same way in the nonparametric_dsmm function). It can be defined in two ways:

  • If \(p\) is not drifting, it has dimensions of \(s \times s\).

  • If \(p\) is drifting, it has dimensions of \(s \times s \times (d+1)\) (see more in Details, Defined Arguments.)

f_dist

Character array, that represents the discrete sojourn time distribution \(f\) of our choice. NA is allowed for state transitions that we do not wish to have a sojourn time distribution (e.g. all state transition to the same state should have NA as their value). The list of possible values is: ["unif", "geom", "pois", "dweibull", "nbinom", NA]. It can be defined in two ways:

  • If \(f\) is not drifting, it has dimensions of \(s \times s\).

  • If \(f\) is drifting, it has dimensions of \(s \times s \times (d+1)\) (see more in Details, Defined Arguments.)

f_dist_pars

Numerical array, that represents the parameters of the sojourn time distributions given in f_dist. NA is allowed, in the case that the distribution of our choice does not require a parameter. It can be defined in two ways:

  • If \(f\) is not drifting, it has dimensions of \(s \times s \times 2\), specifying two possible parameters required for the discrete distributions.

  • If \(f\) is drifting, it has dimensions of \(s \times s \times 2 \times (d+1)\), specifying two possible parameters required for the discrete distributions, but for every single one of the \(i = 0, \dots, d\) sojourn time distributions \(f_{\frac{i}{d}}\) that are required. (see more in Details, Defined Arguments.)

Details

Defined Arguments

For the parametric case, we explicitly define:

  1. The transition matrix of the embedded Markov chain \((J_{t})_{t\in \{0,\dots,n\}}\), given in the attribute p_dist:

    • If \(p\) is not drifting, it contains the values: $$p(u,v), \forall u, v \in E,$$ given in an array with dimensions of \(s \times s\), where the first dimension corresponds to the previous state \(u\) and the second dimension corresponds to the current state \(v\).

    • If \(p\) is drifting, for \(i \in \{ 0,\dots,d \}\), it contains the values: $$p_{\frac{i}{d}}(u,v), \forall u, v \in E,$$ given in an array with dimensions of \(s \times s \times (d + 1)\), where the first and second dimensions are defined as in the non-drifting case, and the third dimension corresponds to the \(d+1\) different matrices \(p_{\frac{i}{d}}.\)

  2. The conditional sojourn time distribution, given in the attribute f_dist:

    • If \(f\) is not drifting, it contains the discrete distribution names (as characters or NA), given in an array with dimensions of \(s \times s\), where the first dimension corresponds to the previous state \(u\), the second dimension corresponds to the current state \(v\).

    • If \(f\) is drifting, it contains the discrete distribution names (as characters or NA) given in an array with dimensions of \(s \times s \times (d + 1)\), where the first and second dimensions are defined as in the non-drifting case, and the third dimension corresponds to the \(d+1\) different arrays \(f_{\frac{i}{d}}.\)

  3. The conditional sojourn time distribution parameters, given in the attribute f_dist_pars:

    • If \(f\) is not drifting, it contains the numerical values (or NA) of the corresponding distributions defined in f_dist, given in an array with dimensions of \(s \times s\), where the first dimension corresponds to the previous state \(u\), the second dimension corresponds to the current state \(v\).

    • If \(f\) is drifting, it contains the numerical values (or NA) of the corresponding distributions defined in f_dist, given in an array with dimensions of \(s \times s \times (d + 1)\), where the first and second dimensions are defined as in the non-drifting case, and the third dimension corresponds to the \(d+1\) different arrays \(f_{\frac{i}{d}}.\)

Sojourn time distributions

In this package, the available distributions for the modeling of the conditional sojourn times, of the drifting semi-Markov model, used through the argument f_dist, are the following:

  • Uniform \((n)\):

    \(f(x) = 1/n\), for \(x = 1, 2, \dots, n\), where \(n\) is a positive integer. This can be specified through the following:

    • f_dist = "unif"

    • f_dist_pars = (\(n\), NA) (\(n\) as defined here).

  • Geometric \((p)\):

    \(f(x) = p (1-p)^{x-1}\), for \(x = 1, 2, \dots,\) where \(p \in (0, 1)\) is the probability of success. This can be specified through the following:

    • f_dist = "geom"

    • f_dist_pars = (\(p\), NA) (\(p\) as defined here).

  • Poisson \((\lambda)\):

    \(f(x) = \frac{\lambda^{x-1} exp(-\lambda)}{(x-1)!}\), for \(x = 1, 2, \dots,\) where \(\lambda > 0\). This can be specified through the following:

    • f_dist = "pois"

    • f_dist_pars = (\(\lambda\), NA)

  • Negative binomial \((\alpha, p)\):

    \(f(x)=\frac{\Gamma(x+\alpha-1)}{\Gamma(\alpha)(x-1)!} p^{\alpha}(1-p)^{x-1}\), for \(x = 1, 2,\dots,\) where \(\Gamma\) is the Gamma function, \(\alpha \in (0, +\infty) \) is the parameter describing the target for number of successful trials, or the dispersion parameter (the shape parameter of the gamma mixing distribution). \(p\) is the probability of success, \(0 < p < 1\).

    • f_dist = "nbinom"

    • f_dist_pars = (\(\alpha, p\)) (\(p\) as defined here)

  • Discrete Weibull of type 1 \((q, \beta)\):

    \(f(x)=q^{(x-1)^{\beta}}-q^{x^{\beta}}\), for \(x=1,2,\dots,\) with \(q \in (0, 1)\) is the first parameter (probability) and \(\beta \in (0, +\infty)\) is the second parameter. This can be specified through the following:

    • f_dist = "dweibull"

    • f_dist_pars = (\(q, \beta\)) (\(q\) as defined here)

From these discrete distributions, by using "dweibull", "nbinom" we require two parameters. It's for this reason that the attribute f_dist_pars is an array of dimensions \(s \times s \times 2\) if \(f\) is not drifting or \(s \times s \times 2 \times (d+1)\) if \(f\) is drifting.

References

V. S. Barbu, N. Limnios. (2008). semi-Markov Chains and Hidden semi-Markov Models Toward Applications - Their Use in Reliability and DNA Analysis. New York: Lecture Notes in Statistics, vol. 191, Springer.

Vergne, N. (2008). Drifting Markov models with Polynomial Drift and Applications to DNA Sequences. Statistical Applications in Genetics Molecular Biology 7 (1).

Barbu V. S., Vergne, N. (2019). Reliability and survival analysis for drifting Markov models: modeling and estimation. Methodology and Computing in Applied Probability, 21(4), 1407-1429.

T. Nakagawa and S. Osaki. (1975). The discrete Weibull distribution. IEEE Transactions on Reliability, R-24, 300-301.

See Also

Methods applied to this object: simulate.dsmm, get_kernel.

For the non-parametric drifting semi-Markov model specification: nonparametric_dsmm.

For the theoretical background of drifting semi-Markov models: dsmmR.

Examples

Run this code
# We can also define states in a flexible way, including spaces.
states <- c("Dollar $", " /1'2'3/ ", " Z E T A ", "O_M_E_G_A")
s <- length(states)
d <- 1

# ===========================================================================
# Defining parametric drifting semi-Markov models.
# ===========================================================================

# ---------------------------------------------------------------------------
# Defining the drifting distributions for Model 1.
# ---------------------------------------------------------------------------

# `p_dist` has dimensions of: (s, s, d + 1).
# Sums over v must be 1 for all u and i = 0, ..., d.

# First matrix.
p_dist_1 <- matrix(c(0,   0.1, 0.4, 0.5,
                     0.5, 0,   0.3, 0.2,
                     0.3, 0.4, 0,   0.3,
                     0.8, 0.1, 0.1, 0),
                   ncol = s, byrow = TRUE)

# Second matrix.
p_dist_2 <- matrix(c(0,   0.3, 0.6, 0.1,
                     0.3, 0,   0.4, 0.3,
                     0.5, 0.3, 0,   0.2,
                     0.2, 0.3, 0.5, 0),
                   ncol = s, byrow = TRUE)

# get `p_dist` as an array of p_dist_1 and p_dist_2.
p_dist_model_1 <- array(c(p_dist_1, p_dist_2), dim = c(s, s, d + 1))

# `f_dist` has dimensions of: (s, s, d + 1).
# First matrix.
f_dist_1 <- matrix(c(NA,         "unif", "dweibull", "nbinom",
                     "geom",      NA,    "pois",     "dweibull",
                     "dweibull", "pois",  NA,        "geom",
                     "pois",      NA,    "geom",      NA),
                   nrow = s, ncol = s, byrow = TRUE)


# Second matrix.
f_dist_2 <- matrix(c(NA,     "pois", "geom", "nbinom",
                     "geom",  NA,    "pois", "dweibull",
                     "unif", "geom",  NA,    "geom",
                     "pois", "pois", "geom",  NA),
                   nrow = s, ncol = s, byrow = TRUE)


# get `f_dist` as an array of `f_dist_1` and `f_dist_2`
f_dist_model_1 <- array(c(f_dist_1, f_dist_2), dim = c(s, s, d + 1))


# `f_dist_pars` has dimensions of: (s, s, 2, d + 1).
# First array of coefficients, corresponding to `f_dist_1`.
# First matrix.
f_dist_1_pars_1 <- matrix(c(NA,  5,  0.4, 4,
                            0.7, NA, 5,   0.6,
                            0.2, 3,  NA,  0.6,
                            4,   NA, 0.4, NA),
                          nrow = s, ncol = s, byrow = TRUE)
# Second matrix.
f_dist_1_pars_2 <- matrix(c(NA,  NA, 0.2, 0.6,
                            NA,  NA, NA,  0.8,
                            0.6, NA, NA,  NA,
                            NA,  NA, NA,  NA),
                          nrow = s, ncol = s, byrow = TRUE)

# Second array of coefficients, corresponding to `f_dist_2`.
# First matrix.
f_dist_2_pars_1 <- matrix(c(NA,  6,   0.4, 3,
                            0.7, NA,  2,   0.5,
                            3,   0.6, NA,  0.7,
                            6,   0.2, 0.7, NA),
                          nrow = s, ncol = s, byrow = TRUE)
# Second matrix.
f_dist_2_pars_2 <- matrix(c(NA, NA, NA, 0.6,
                            NA, NA, NA, 0.8,
                            NA, NA, NA, NA,
                            NA, NA, NA, NA),
                          nrow = s, ncol = s, byrow = TRUE)

# Get `f_dist_pars`.
f_dist_pars_model_1 <- array(c(f_dist_1_pars_1, f_dist_1_pars_2,
                               f_dist_2_pars_1, f_dist_2_pars_2),
                             dim = c(s, s, 2, d + 1))

# ---------------------------------------------------------------------------
# Parametric object for Model 1.
# ---------------------------------------------------------------------------

obj_par_model_1 <- parametric_dsmm(
    model_size = 10000,
    states = states,
    initial_dist = c(0.8, 0.1, 0.1, 0),
    degree = d,
    p_dist = p_dist_model_1,
    f_dist = f_dist_model_1,
    f_dist_pars = f_dist_pars_model_1,
    p_is_drifting = TRUE,
    f_is_drifting = TRUE
)

# p drifting array.
p_drift <- obj_par_model_1$dist$p_drift
p_drift

# f distribution.
f_dist_drift <- obj_par_model_1$dist$f_drift_parametric
f_dist_drift

# parameters for the f distribution.
f_dist_pars_drift <- obj_par_model_1$dist$f_drift_parameters
f_dist_pars_drift

# ---------------------------------------------------------------------------
# Defining Model 2 - p is drifting, f is not drifting.
# ---------------------------------------------------------------------------

# `p_dist` has the same dimensions as in Model 1: (s, s, d + 1).
p_dist_model_2 <- array(c(p_dist_1, p_dist_2), dim = c(s, s, d + 1))

# `f_dist` has dimensions of: (s, s).
f_dist_model_2 <- matrix(c( NA,       "pois",  NA,       "nbinom",
                            "geom",    NA,    "geom",    "dweibull",
                            "unif",   "geom",  NA,       "geom",
                            "nbinom", "unif", "dweibull", NA),
                         nrow = s, ncol = s, byrow = TRUE)

# `f_dist_pars` has dimensions of: (s, s, 2),
#  corresponding to `f_dist_model_2`.

# First matrix.
f_dist_pars_1_model_2 <- matrix(c(NA,  0.2, NA,  3,
                                  0.2, NA,  0.2, 0.5,
                                  3,   0.4, NA,  0.7,
                                  2,   3,   0.7, NA),
                                nrow = s, ncol = s, byrow = TRUE)

# Second matrix.
f_dist_pars_2_model_2 <- matrix(c(NA,  NA, NA,  0.6,
                                  NA,  NA, NA,  0.8,
                                  NA,  NA, NA,  NA,
                                  0.2, NA, 0.3, NA),
                                nrow = s, ncol = s, byrow = TRUE)


# Get `f_dist_pars`.
f_dist_pars_model_2 <- array(c(f_dist_pars_1_model_2,
                               f_dist_pars_2_model_2),
                             dim = c(s, s, 2))


# ---------------------------------------------------------------------------
# Parametric object for Model 2.
# ---------------------------------------------------------------------------

obj_par_model_2 <- parametric_dsmm(
    model_size = 10000,
    states = states,
    initial_dist = c(0.8, 0.1, 0.1, 0),
    degree = d,
    p_dist = p_dist_model_2,
    f_dist = f_dist_model_2,
    f_dist_pars = f_dist_pars_model_2,
    p_is_drifting = TRUE,
    f_is_drifting = FALSE
)

# p drifting array.
p_drift <- obj_par_model_2$dist$p_drift
p_drift

# f distribution.
f_dist_notdrift <- obj_par_model_2$dist$f_notdrift_parametric
f_dist_notdrift

# parameters for the f distribution.
f_dist_pars_notdrift <- obj_par_model_2$dist$f_notdrift_parameters
f_dist_pars_notdrift

# ---------------------------------------------------------------------------
# Defining Model 3 - f is drifting, p is not drifting.
# ---------------------------------------------------------------------------

# `p_dist` has dimensions of: (s, s).
p_dist_model_3 <- matrix(c(0,   0.1,  0.3,  0.6,
                           0.4, 0,    0.1,  0.5,
                           0.4, 0.3,  0,    0.3,
                           0.9, 0.01, 0.09, 0),
                         ncol = s, byrow = TRUE)

# `f_dist` has the same dimensions as in Model 1: (s, s, d + 1).
f_dist_model_3 <- array(c(f_dist_1, f_dist_2), dim = c(s, s, d + 1))


# `f_dist_pars` has the same dimensions as in Model 1: (s, s, 2, d + 1).
f_dist_pars_model_3 <- array(c(f_dist_1_pars_1, f_dist_1_pars_2,
                               f_dist_2_pars_1, f_dist_2_pars_2),
                             dim = c(s, s, 2, d + 1))

# ---------------------------------------------------------------------------
# Parametric object for Model 3.
# ---------------------------------------------------------------------------

obj_par_model_3 <- parametric_dsmm(
    model_size = 10000,
    states = states,
    initial_dist = c(0.3, 0.2, 0.2, 0.3),
    degree = d,
    p_dist = p_dist_model_3,
    f_dist = f_dist_model_3,
    f_dist_pars = f_dist_pars_model_3,
    p_is_drifting = FALSE,
    f_is_drifting = TRUE
)

# p drifting array.
p_notdrift <- obj_par_model_3$dist$p_notdrift
p_notdrift

# f distribution.
f_dist_drift <- obj_par_model_3$dist$f_drift_parametric
f_dist_drift

# parameters for the f distribution.
f_dist_pars_drift <- obj_par_model_3$dist$f_drift_parameters
f_dist_pars_drift

# ===========================================================================
# Parametric estimation using methods corresponding to an object
#     which inherits from the class `dsmm_parametric`.
# ===========================================================================

### Comments
### 1.  Using a larger `klim` and a larger `model_size` will increase the
###     accuracy of the model, with the need of larger memory requirements
###     and computational cost.
### 2.  For the parametric estimation it is recommended to use a common set
###     of distributions while only the parameters are drifting. This results
###     in higher accuracy.


# ---------------------------------------------------------------------------
# Defining the distributions for Model 1 - both p and f are drifting.
# ---------------------------------------------------------------------------

# `p_dist` has dimensions of: (s, s, d + 1).
# First matrix.
p_dist_1 <- matrix(c(0,   0.2, 0.4, 0.4,
                     0.5, 0,   0.3, 0.2,
                     0.3, 0.4, 0,   0.3,
                     0.5, 0.3, 0.2, 0),
                   ncol = s, byrow = TRUE)

# Second matrix.
p_dist_2 <- matrix(c(0,   0.3, 0.5, 0.2,
                     0.3, 0,   0.4, 0.3,
                     0.5, 0.3, 0,   0.2,
                     0.2, 0.4, 0.4, 0),
                   ncol = s, byrow = TRUE)

# get `p_dist` as an array of p_dist_1 and p_dist_2.
p_dist_model_1 <- array(c(p_dist_1, p_dist_2), dim = c(s, s, d + 1))

# `f_dist` has dimensions of: (s, s, d + 1).
# We will use the same sojourn time distributions.
f_dist_1 <- matrix(c( NA,        "unif",   "dweibull", "nbinom",
                     "geom",      NA,      "pois",     "dweibull",
                     "dweibull", "pois",    NA,        "geom",
                     "pois",     'nbinom', "geom",      NA),
                   nrow = s, ncol = s, byrow = TRUE)

# get `f_dist`
f_dist_model_1 <- array(f_dist_1, dim = c(s, s, d + 1))

# `f_dist_pars` has dimensions of: (s, s, 2, d + 1).
# First array of coefficients, corresponding to `f_dist_1`.
# First matrix.
f_dist_1_pars_1 <- matrix(c(NA,  7,  0.4, 4,
                            0.7, NA, 5,   0.6,
                            0.2, 3,  NA,  0.6,
                            4,   4,  0.4, NA),
                          nrow = s, ncol = s, byrow = TRUE)
# Second matrix.
f_dist_1_pars_2 <- matrix(c(NA,  NA,  0.2, 0.6,
                            NA,  NA,  NA,  0.8,
                            0.6, NA,  NA,  NA,
                            NA,  0.3, NA,  NA),
                          nrow = s, ncol = s, byrow = TRUE)

# Second array of coefficients, corresponding to `f_dist_2`.
# First matrix.
f_dist_2_pars_1 <- matrix(c(NA,  6,  0.5, 3,
                            0.5, NA, 4,   0.5,
                            0.4, 5,  NA,  0.7,
                            6,   5,  0.7, NA),
                          nrow = s, ncol = s, byrow = TRUE)
# Second matrix.
f_dist_2_pars_2 <- matrix(c(NA,  NA,  0.4, 0.5,
                            NA,  NA,  NA,  0.6,
                            0.5, NA,  NA,  NA,
                            NA,  0.4, NA,  NA),
                          nrow = s, ncol = s, byrow = TRUE)
# Get `f_dist_pars`.
f_dist_pars_model_1 <- array(c(f_dist_1_pars_1, f_dist_1_pars_2,
                               f_dist_2_pars_1, f_dist_2_pars_2),
                             dim = c(s, s, 2, d + 1))

# ---------------------------------------------------------------------------
# Defining the parametric object for Model 1.
# ---------------------------------------------------------------------------

obj_par_model_1 <- parametric_dsmm(
    model_size = 4000,
    states = states,
    initial_dist = c(0.8, 0.1, 0.1, 0),
    degree = d,
    p_dist = p_dist_model_1,
    f_dist = f_dist_model_1,
    f_dist_pars = f_dist_pars_model_1,
    p_is_drifting = TRUE,
    f_is_drifting = TRUE
)

cat("The object has class of (",
    paste0(class(obj_par_model_1),
           collapse = ', '), ").")


# ---------------------------------------------------------------------------
# Generating a sequence from the parametric object.
# ---------------------------------------------------------------------------

# A larger klim will lead to an increase in accuracy.
klim <- 20
sim_seq <- simulate(obj_par_model_1, klim = klim, seed = 1)


# ---------------------------------------------------------------------------
# Fitting the generated sequence under the same distributions.
# ---------------------------------------------------------------------------

fit_par_model1 <- fit_dsmm(sequence = sim_seq,
                           states = states,
                           degree = d,
                           f_is_drifting = TRUE,
                           p_is_drifting = TRUE,
                           estimation = 'parametric',
                           f_dist = f_dist_model_1)

cat("The object has class of (",
    paste0(class(fit_par_model1),
           collapse = ', '), ").")

cat("\nThe estimated parameters are:\n")
fit_par_model1$dist$f_drift_parameters


Run the code above in your browser using DataLab