nonparametric_dsmm: Non-parametric Drifting semi-Markov model specification

Description

Creates a non-parametric model specification for a drifting semi-Markov model. Returns an object of class (dsmm_nonparametric, dsmm).

Usage

nonparametric_dsmm(
  model_size,
  states,
  initial_dist,
  degree,
  k_max,
  f_is_drifting,
  p_is_drifting,
  p_dist,
  f_dist
)

Value

Returns an object of the S3 class

dsmm_nonparametric,dsmm.

dist : List. Contains 2 arrays, passing down from the arguments:
- p_drift or p_notdrift, corresponding to whether the defined $p$ transition matrix is drifting or not.
- f_drift or f_notdrift, corresponding to whether the defined $f$ sojourn time distribution is drifting or not.
initial_dist : Numerical vector. Passing down from the arguments. It contains the initial distribution of the drifting semi-Markov model.
states : Character vector. Passing down from the arguments. It contains the state space $E$.
s : Positive integer. It contains the number of states in the state space, $s = |E|$, which is given in the attribute states.
degree : Positive integer. Passing down from the arguments. It contains the polynomial degree $d$ considered for the drifting of the model.
k_max : Numerical value. Passing down from the arguments. It contains the maximum sojourn time, for the drifting semi-Markov model.
model_size : Positive integer. Passing down from the arguments. It contains the size of the drifting semi-Markov model $n$, which represents the length of the embedded Markov chain $(J_{t})_{t\in \{0,\dots,n\}}$, without the last state.
f_is_drifting : Logical. Passing down from the arguments. Specifies if $f$ is drifting or not.
p_is_drifting : Logical. Passing down from the arguments. Specifies if $p$ is drifting or not.
Model : Character. Possible values:
- "Model_1" : Both $p$ and $f$ are drifting.
- "Model_2" : $p$ is drifting and $f$ is not drifting.
- "Model_3" : $f$ is drifting and $p$ is not drifting.
A_i : Numerical Matrix. Represents the polynomials $A_i(t)$ with degree $d$ that are used for solving the system $MJ = P$. Used for the methods defined for the object. Not printed when viewing the object.

Arguments

model_size

Positive integer that represents the size of the drifting semi-Markov model $n$. It is equal to the length of a theoretical embedded Markov chain $(J_{t})_{t\in \{0,\dots,n\}}$, without the last state.

states

Character vector that represents the state space $E$ . It has length equal to $s = |E|$.

initial_dist

Numerical vector of $s$ probabilities, that represents the initial distribution for each state in the state space $E$.

degree

Positive integer that represents the polynomial degree $d$ for the drifting semi-Markov model.

k_max

Positive integer that represents the maximum sojourn time of choice, for the drifting semi-Markov model.

f_is_drifting

Logical. Specifies if $f$ is drifting or not.

p_is_drifting

Logical. Specifies if $p$ is drifting or not.

p_dist

Numerical array, that represents the probabilities of the transition matrix $p$ of the embedded Markov chain $(J_{t})_{t\in \{0,\dots,n\}}$ (it is defined the same way in the parametric_dsmm function). It can be defined in two ways:

If $p$ is not drifting, it has dimensions of $s \times s$.
If $p$ is drifting, it has dimensions of $s \times s \times (d+1)$ (see more in Details, Defined Arguments.)

f_dist

Numerical array, that represents the probabilities of the conditional sojourn time distributions $f$. $0$ is allowed for state transitions that we do not wish to have a sojourn time distribution (e.g. all state transitions to the same state should have $0$ as their value). It can be defined in two ways:

If $f$ is not drifting, it has dimensions of $s \times s \times k_{max}$.
If $f$ is drifting, it has dimensions of $s \times s \times k_{max} \times (d+1)$ (see more in Details, Defined Arguments.)

Details

Defined Arguments

For the non-parametric case, we explicitly define:

The transition matrix of the embedded Markov chain $(J_{t})_{t\in \{0,\dots,n\}}$, given in the attribute p_dist:
- If $p$ is not drifting, it contains the values: $$p(u, v), \forall u, v \in E,$$ given in an array with dimensions of $s \times s$, where the first dimension corresponds to the previous state $u$ and the second dimension corresponds to the current state $v$.
- If $p$ is drifting then, for $i \in\{0,\dots,d\}$, it contains the values: $$p_{\frac{i}{d}}(u,v), \forall u, v \in E,$$ given in an array with dimensions of $s \times s \times (d + 1)$, where the first and second dimensions are defined as in the non-drifting case, and the third dimension corresponds to the $d+1$ different matrices $p_{\frac{i}{d}}.$
The conditional sojourn time distribution, given in the attribute f_dist:
- If $f$ is not drifting, it contains the values: $$f(u,v,l), \forall u,v\in E,\forall l\in \{1,\dots,k_{max}\},$$ given in an array with dimensions of $s \times s \times k_{max}$, where the first dimension corresponds to the previous state $u$, the second dimension corresponds to the current state $v$, and the third dimension correspond to the sojourn time $l$.
- If $f$ is drifting then, for $i\in \{0,\dots,d\}$, it contains the values: $$f_{\frac{i}{d}}(u,v,l),\forall u,v\in E, \forall l\in \{1,\dots,k_{max}\},$$ given in an array with dimensions of $s \times s \times k_{max} \times (d + 1)$, where the first, second and third dimensions are defined as in the non-drifting case, and the fourth dimension corresponds to the $d+1$ different arrays $f_{\frac{i}{d}}.$

References

V. S. Barbu, N. Limnios. (2008). semi-Markov Chains and Hidden semi-Markov Models Toward Applications - Their Use in Reliability and DNA Analysis. New York: Lecture Notes in Statistics, vol. 191, Springer.

Vergne, N. (2008). Drifting Markov models with Polynomial Drift and Applications to DNA Sequences. Statistical Applications in Genetics Molecular Biology 7 (1).

Barbu V. S., Vergne, N. (2019). Reliability and survival analysis for drifting Markov models: modeling and estimation. Methodology and Computing in Applied Probability, 21(4), 1407-1429.

Examples

Run this code

# Setup.
states <- c("AA", "AC", "CC")
s <- length(states)
d <- 2
k_max <- 3

# ===========================================================================
# Defining non-parametric drifting semi-Markov models.
# ===========================================================================

# ---------------------------------------------------------------------------
# Defining distributions for Model 1 - both p and f are drifting.
# ---------------------------------------------------------------------------

# `p_dist` has dimensions of: (s, s, d + 1).
# Sums over v must be 1 for all u and i = 0, ..., d.
p_dist_1 <- matrix(c(0,   0.1, 0.9,
                     0.5, 0,   0.5,
                     0.3, 0.7, 0),
                   ncol = s, byrow = TRUE)

p_dist_2 <- matrix(c(0,   0.6, 0.4,
                     0.7, 0,   0.3,
                     0.6, 0.4, 0),
                   ncol = s, byrow = TRUE)

p_dist_3 <- matrix(c(0,   0.2, 0.8,
                     0.6, 0,   0.4,
                     0.7, 0.3, 0),
                   ncol = s, byrow = TRUE)

# Get `p_dist` as an array of p_dist_1, p_dist_2 and p_dist_3.
p_dist <- array(c(p_dist_1, p_dist_2, p_dist_3),
                dim = c(s, s, d + 1))

# `f_dist` has dimensions of: (s, s, k_max, d + 1).
# First f distribution. Dimensions: (s, s, k_max).
# Sums over l must be 1, for every u, v and i = 0, ..., d.
f_dist_1_l_1 <- matrix(c(0,   0.2, 0.7,
                         0.3, 0,   0.4,
                         0.2, 0.8, 0),
                       ncol = s, byrow = TRUE)

f_dist_1_l_2 <- matrix(c(0,   0.3,  0.2,
                         0.2, 0,    0.5,
                         0.1, 0.15, 0),
                       ncol = s, byrow = TRUE)

f_dist_1_l_3 <- matrix(c(0,   0.5,  0.1,
                         0.5, 0,    0.1,
                         0.7, 0.05, 0),
                       ncol = s, byrow = TRUE)
# Get f_dist_1
f_dist_1 <- array(c(f_dist_1_l_1, f_dist_1_l_2, f_dist_1_l_3),
                  dim = c(s, s, k_max))

# Second f distribution. Dimensions: (s, s, k_max)
f_dist_2_l_1 <- matrix(c(0,   1/3, 0.4,
                         0.3, 0,   0.4,
                         0.2, 0.1, 0),
                       ncol = s, byrow = TRUE)

f_dist_2_l_2 <- matrix(c(0,   1/3, 0.4,
                         0.4, 0,   0.2,
                         0.3, 0.4, 0),
                       ncol = s, byrow = TRUE)

f_dist_2_l_3 <- matrix(c(0,   1/3, 0.2,
                         0.3, 0,   0.4,
                         0.5, 0.5, 0),
                       ncol = s, byrow = TRUE)

# Get f_dist_2
f_dist_2 <- array(c(f_dist_2_l_1, f_dist_2_l_2, f_dist_2_l_3),
                  dim = c(s, s, k_max))

# Third f distribution. Dimensions: (s, s, k_max)
f_dist_3_l_1 <- matrix(c(0,    0.3, 0.3,
                         0.3,  0,   0.5,
                         0.05, 0.1, 0),
                       ncol = s, byrow = TRUE)

f_dist_3_l_2 <- matrix(c(0,   0.2, 0.6,
                         0.3, 0,   0.35,
                         0.9, 0.2, 0),
                       ncol = s, byrow = TRUE)

f_dist_3_l_3 <- matrix(c(0,    0.5, 0.1,
                         0.4,  0,   0.15,
                         0.05, 0.7, 0),
                       ncol = s, byrow = TRUE)

# Get f_dist_3
f_dist_3 <- array(c(f_dist_3_l_1, f_dist_3_l_2, f_dist_3_l_3),
                  dim = c(s, s, k_max))

# Get f_dist as an array of f_dist_1, f_dist_2 and f_dist_3.
f_dist <- array(c(f_dist_1, f_dist_2, f_dist_3),
                dim = c(s, s, k_max, d + 1))

# ---------------------------------------------------------------------------
# Non-Parametric object for Model 1.
# ---------------------------------------------------------------------------

obj_nonpar_model_1 <- nonparametric_dsmm(
    model_size = 8000,
    states = states,
    initial_dist = c(0.3, 0.5, 0.2),
    degree = d,
    k_max = k_max,
    p_dist = p_dist,
    f_dist = f_dist,
    p_is_drifting = TRUE,
    f_is_drifting = TRUE
)

# p drifting array.
p_drift <- obj_nonpar_model_1$dist$p_drift
p_drift

# f distribution.
f_drift <- obj_nonpar_model_1$dist$f_drift
f_drift

# ---------------------------------------------------------------------------
# Defining Model 2 - p is drifting, f is not drifting.
# ---------------------------------------------------------------------------

# p_dist has the same dimensions as in Model 1: (s, s, d + 1).
p_dist_model_2 <- array(c(p_dist_1, p_dist_2, p_dist_3),
                        dim = c(s, s, d + 1))

# f_dist has dimensions of: (s,s,k_{max}).
f_dist_model_2 <- f_dist_2


# ---------------------------------------------------------------------------
# Non-Parametric object for Model 2.
# ---------------------------------------------------------------------------

obj_nonpar_model_2 <- nonparametric_dsmm(
    model_size = 10000,
    states = states,
    initial_dist = c(0.7, 0.1, 0.2),
    degree = d,
    k_max = k_max,
    p_dist = p_dist_model_2,
    f_dist = f_dist_model_2,
    p_is_drifting = TRUE,
    f_is_drifting = FALSE
)

# p drifting array.
p_drift <- obj_nonpar_model_2$dist$p_drift
p_drift

# f distribution array.
f_notdrift <- obj_nonpar_model_2$dist$f_notdrift
f_notdrift


# ---------------------------------------------------------------------------
# Defining Model 3 - f is drifting, p is not drifting.
# ---------------------------------------------------------------------------


# `p_dist` has dimensions of: (s, s, d + 1).
p_dist_model_3 <- p_dist_3


# `f_dist` has the same dimensions as in Model 1: (s, s, d + 1).
f_dist_model_3 <- array(c(f_dist_1, f_dist_2, f_dist_3),
                        dim = c(s, s, k_max, d + 1))


# ---------------------------------------------------------------------------
# Non-Parametric object for Model 3.
# ---------------------------------------------------------------------------

obj_nonpar_model_3 <- nonparametric_dsmm(
    model_size = 10000,
    states = states,
    initial_dist = c(0.3, 0.4, 0.3),
    degree = d,
    k_max = k_max,
    p_dist = p_dist_model_3,
    f_dist = f_dist_model_3,
    p_is_drifting = FALSE,
    f_is_drifting = TRUE
)

# p distribution matrix.
p_notdrift <- obj_nonpar_model_3$dist$p_notdrift
p_notdrift

# f distribution array.
f_drift <- obj_nonpar_model_3$dist$f_drift
f_drift

# ===========================================================================
# Using methods for non-parametric objects.
# ===========================================================================

kernel_parametric <- get_kernel(obj_nonpar_model_3)
str(kernel_parametric)

sim_seq_par <- simulate(obj_nonpar_model_3, nsim = 50)
str(sim_seq_par)

Run the code above in your browser using DataLab