Estimation of d+1 points of support transition matrices and \(|E|^{k}\) initial law of a k-th order drifting Markov Model starting from one or several sequences.
fitdmm(
sequences,
order,
degree,
states,
init.estim = c("mle", "freq", "prod", "stationary", "unif"),
fit.method = c("sum"),
ncpu = 2
)An object of class dmm
A list of character vector(s) representing one (several) sequence(s)
Order of the Markov chain
Degree of the polynomials (e.g., linear drifting if degree=1, etc.)
Vector of states space of length s > 1
Default="mle". Method used to estimate the initial law.
If init.estim = "mle", then the classical Maximum Likelihood Estimator
is used, if init.estim = "freq", then, the initial distribution init.estim
is estimated by taking the frequences of the words of length k for all
sequences. If init.estim = "prod", then, init.estim is estimated by using
the product of the frequences of each letter (for all the sequences) in
the word of length k. If init.estim = "stationary", then init.estim is
estimated by using the stationary law of the point of support transition
matrices of each letter. If init.estim = "unif",
then, init.estim of each letter is estimated by using \(\frac{1}{s}\). Or
`init.estim`= customisable vector of length \(|E|^k\). See Details for the formulas.
If sequences is a list of several character vectors of the same length,
the usual LSE over the sample paths is proposed when fit.method="sum" (a list of a single character vector
is its special case).
Default=2. Represents the number of cores used to parallelized computation. If ncpu=-1, then it uses all available cores.
Geoffray Brelurut, Alexandre Seiller
The fitdmm function creates a drifting Markov model object dmm.
Let \(E={1,\ldots, s}\), s < \(\infty\) be random system with finite state space, with a time evolution governed by discrete-time stochastic process of values in \(E\). A sequence \(X_0, X_1, \ldots, X_n\) with state space \(E= {1, 2, \ldots, s}\) is said to be a linear drifting Markov chain (of order 1) of length \(n\) between the Markov transition matrices \(\Pi_0\) and \(\Pi_1\) if the distribution of \(X_t\), \(t = 1, \ldots, n\), is defined by \(P(X_t=v \mid X_{t-1} = u, X_{t-2}, \ldots ) = \Pi_{\frac{t}{n}}(u, v), ; u, v \in E\), where \(\Pi_{\frac{t}{n}}(u, v) = ( 1 - \frac{t}{n}) \Pi_0(u, v) + \frac{t}{n} \Pi_1(u, v), \; u, v \in E\). The linear drifting Markov model of order \(1\) can be generalized to polynomial drifting Markov model of order \(k\) and degree \(d\).Let \(\Pi_{\frac{i}{d}} = (\Pi_{\frac{i}{d}}(u_1, \dots, u_k, v))_{u_1, \dots, u_k,v \in E}\) be \(d\) Markov transition matrices (of order \(k\)) over a state space \(E\).
The estimation of DMMs is carried out for 4 different types of data :
It is denoted by \(H(m,n):= (X_0,X_1, \ldots,X_{m})\), where m denotes the length of the sample path and \(n\) the length of the drifting Markov chain. Two cases can be considered:
m=n (a complete sample path),
m < n (an incomplete sample path).
It is denoted by \(H_i(m_i,n_i), i=1, \ldots, H\). Two cases cases are considered :
\(m_i=n_i=n \forall i=1, \ldots, H\) (complete sample paths of drifting Markov chains of the same length),
\(n_i=n \forall i=1, \ldots, H\) (incomplete sample paths of drifting Markov chains of the same length). In this case, an usual LSE over the sample paths is used.
The initial distribution of a k-th order drifting Markov Model is defined as \(\mu_i = P(X_1 = i)\). The initial distribution of the k first letters is freely customisable by the user, but five methods are proposed for the estimation of the latter :
The Maximum Likelihood Estimator for the initial distribution. The formula is: \(\widehat{\mu_i} = \frac{Nstart_i}{L}\), where \(Nstart_i\) is the number of occurences of the word \(i\) (of length \(k\)) at the beginning of each sequence and \(L\) is the number of sequences. This estimator is reliable when the number of sequences \(L\) is high.
The initial distribution is estimated by taking the frequences of the words of length k for all sequences. The formula is \(\widehat{\mu_i} = \frac{N_i}{N}\), where \(N_i\) is the number of occurences of the word \(i\) (of length \(k\)) in the sequences and \(N\) is the sum of the lengths of the sequences.
The initial distribution is estimated by using the product of the frequences of each state (for all the sequences) in the word of length \(k\).
The initial distribution is estimated using \(\mu(\Pi_{\frac{k-1}{n}}) \)
\(\frac{1}{s}\)
BaVe2018drimmR Ver08drimmR
data(lambda, package = "drimmR")
states <- c("a","c","g","t")
order <- 1
degree <- 1
fitdmm(lambda,order,degree,states, init.estim = "freq",fit.method="sum")
Run the code above in your browser using DataLab