Performs a test for first-order Markovianness of a data series by inferring the sequence of i.i.d. U(0,1) random noise that might have generated it.
markov.test(x, type = c("lb", "ks"), method = "holm", lag = 20, ...)
the data series as a vector.
the procedures to use to test whether or not the disturbance series is independently and identically distributed on the unit interval. See ‘Details’.
the correction method to be used for adjusting the p-values. It is identical to the
method
argument of the p.adjust
function, which is called
to adjust the p-values.
the number of lags to use when applying the Ljung-Box (portmanteau) test (lb.test
).
parameters to pass on to functions that can be subsequently called.
A list with class "multiplehtest" containing the following components:
the character string “Composite test for a first-order (finite state) Markov chain”.
the values of the test statistic for all the tests.
parameters for all the tests. Exactly one parameter is
recorded for each test, for example, df
for lb.test
.
Any additional parameters are not saved, for
example, the a
and b
parameters of chisq.unif.test
.
p-values of all the tests.
a vector of character strings indicating what type of tests were performed.
the adjusted p-values.
a character string giving the name of the data.
indicates which correction method was used to adjust the p-values for multiple testing.
the transition matrix estimated to fit a first-order Markov chain to the data and used to generate the infered random disturbance.
This function tests a symbolic sequence for first-order Markovianness (also known as the Markov property). It does this by reverse-engineering the sequence to obtain a sample of the kind of output from a pseudo-random number generator that would have produced the observed sequence if it had been generated by simulating a Markov chain .The sample output is then tested to see if it is an independent and identically distributed siequence of uniform numbers in the range 0-1. this involves the application of at least two tests, one for independence and another for uniformity over the unit interval. One concludes that the sequence is Markovian if the sample output passes the tests (that is, all null hypotheses are accepted) and non-Markovian otherwise.
The test is set up as follows:
\(H_0\): the sequence is first-order Markov \(H_1\): the sequence is not first-order Markov
To simplify the use of the test, correction for multiple testing is carried out, which yields a single adjusted p- value. If this p-value is less than the significance level established for the test procedure, the null hypothesis of Markovianness is rejected. Otherwise, the null hypothesis should be accepted.
To correctly apply the test, use the type
argument to specify at least
one test of independence and one test of uniformity from the options displayed
in the following table.
Category | Function | Test |
Uniformity | ks.unif.test |
Kolmogorov-Smirnov test for uniform$(0,1)$ data |
chisq.unif.test |
Pearson's chi-squared test for discrete uniform data, | |
Independence | lb.test |
Ljung-Box $Q$ test for uncorrelated data |
diffsign.test |
signed difference test of independence | |
turningpoint.test |
turning point test of independence | |
rank.test |
rank test of independence |
If type
is not specified, lb.test
and
ks.unif.test
are used by default.
As this procedure performs multiple tests in order to assess if the sequence has
a Markovian dependence structure, it is necessary to adjust the p-values for
multiple testing. By default, the Holm-Bonferroni method (holm
) is used
to correct for multiple testing, but this can be overridden via the
method
argument. The adjusted p-values are displayed when the result of
the test is printed.
The smallest adjusted p-value constitutes the overall p-value for the test. If this p-value is less than the significance level fixed for the test procedure, the null hypothesis of first-order Markovianness is rejected. Otherwise, the null hypothesis should be accepted.
Hart, A.G. and Mart<ed>nez, S. (2011) Statistical testing of Chargaff's second parity rule in bacterial genome sequences. Stoch. Models 27(2), 1--46.
Hart, A.G. and Mart<ed>nez, S. (2014) Markovianness and Conditional Independence in Annotated Bacterial DNA. Stat. Appl. Genet. Mol. Biol. 13(6), 693-716. arXiv:1311.4411 [q-bio.QM].
markov.disturbance
, diid.test
,
ks.unif.test
, chisq.unif.test
,
diffsign.test
, turningpoint.test
, rank.test
,
lb.test
# NOT RUN {
#Generate an IID uniform DNA sequence
seq <- simulateMarkovChain(5000, matrix(0.25, 4, 4), states=c("a","c","g","t"))
markov.test(seq)
# }
Run the code above in your browser using DataLab