fitdmm {drimmR} | R Documentation |
Estimation of d+1 points of support transition matrices and |E|^{k}
initial law of a k-th
order drifting Markov Model starting from one or several sequences.
fitdmm(
sequences,
order,
degree,
states,
init.estim = c("mle", "freq", "prod", "stationary", "unif"),
fit.method = c("sum"),
ncpu = 2
)
sequences |
A list of character vector(s) representing one (several) sequence(s) |
order |
Order of the Markov chain |
degree |
Degree of the polynomials (e.g., linear drifting if |
states |
Vector of states space of length s > 1 |
init.estim |
Default="mle". Method used to estimate the initial law.
If |
fit.method |
If |
ncpu |
Default=2. Represents the number of cores used to parallelized computation. If ncpu=-1, then it uses all available cores. |
The fitdmm function creates a drifting Markov model object dmm
.
Let E={1,\ldots, s}
, s < \infty
be random system with finite state space,
with a time evolution governed by discrete-time stochastic process of values in E
.
A sequence X_0, X_1, \ldots, X_n
with state space E= {1, 2, \ldots, s}
is said to be a
linear drifting Markov chain (of order 1) of length n
between the Markov transition matrices
\Pi_0
and \Pi_1
if the distribution of X_t
, t = 1, \ldots, n
, is defined by
P(X_t=v \mid X_{t-1} = u, X_{t-2}, \ldots ) = \Pi_{\frac{t}{n}}(u, v), ; u, v \in E
, where
\Pi_{\frac{t}{n}}(u, v) = ( 1 - \frac{t}{n}) \Pi_0(u, v) + \frac{t}{n} \Pi_1(u, v), \; u, v \in E
.
The linear drifting Markov model of order 1
can be generalized to polynomial drifting Markov model of
order k
and degree d
.Let \Pi_{\frac{i}{d}} = (\Pi_{\frac{i}{d}}(u_1, \dots, u_k, v))_{u_1, \dots, u_k,v \in E}
be d
Markov transition matrices (of order k
) over a state space E
.
The estimation of DMMs is carried out for 4 different types of data :
It is denoted by H(m,n):= (X_0,X_1, \ldots,X_{m})
,
where m denotes the length of the sample path and n
the length of the drifting Markov chain.
Two cases can be considered:
m=n (a complete sample path),
m < n (an incomplete sample path).
H
i.i.d. sample paths :It is denoted by H_i(m_i,n_i), i=1, \ldots, H
.
Two cases cases are considered :
m_i=n_i=n \forall i=1, \ldots, H
(complete sample paths of drifting Markov chains of the same length),
n_i=n \forall i=1, \ldots, H
(incomplete sample paths of drifting Markov chains of the same length).
In this case, an usual LSE over the sample paths is used.
The initial distribution of a k-th order drifting Markov Model is defined as
\mu_i = P(X_1 = i)
. The initial distribution of the k first letters is freely
customisable by the user, but five methods are proposed for the estimation
of the latter :
The Maximum Likelihood Estimator for the initial distribution. The
formula is: \widehat{\mu_i} = \frac{Nstart_i}{L}
, where
Nstart_i
is the number of occurences of the word i
(of
length k
) at the beginning of each sequence and L
is the
number of sequences. This estimator is reliable when the number of
sequences L
is high.
The initial distribution is
estimated by taking the frequences of the words of length k for all
sequences. The formula is \widehat{\mu_i} = \frac{N_i}{N}
, where
N_i
is the number of occurences of the word i
(of length k
)
in the sequences and N
is the sum of the lengths of the sequences.
The initial distribution is estimated by using the product of the
frequences of each state (for all the sequences) in the word of length
k
.
The initial distribution is estimated using \mu(\Pi_{\frac{k-1}{n}})
\frac{1}{s}
An object of class dmm
Geoffray Brelurut, Alexandre Seiller
Barbu VS, Vergne N (2018). “Reliability and survival analysis for drifting Markov models: modelling and estimation.” Methodology and Computing in Applied Probability, 1–33. doi: 10.1007/s11009-018-9682-8, https://doi.org/10.1007/s11009-018-9682-8. Vergne N (2008). “Drifting Markov models with polynomial drift and applications to DNA sequences.” Statistical Applications in Genetics Molecular Biology , 7(1) . doi: 10.2202/1544-6115.1326, https://doi.org/10.2202/1544-6115.1326.
data(lambda, package = "drimmR")
states <- c("a","c","g","t")
order <- 1
degree <- 1
fitdmm(lambda,order,degree,states, init.estim = "freq",fit.method="sum")