Learn R Programming

spgs (version 1.0-4)

chargaff2.test: Matrix Test of CSPR for Dinucleotides

Description

Performs the matrix test of Chargaff's second parity rule (CSPR) for dinucleotides proposed in Hart and Martínez (2011).

Usage

chargaff2.test(x, alg=c("table", "simulate", "upper"), n, no.p.value=FALSE)

Value

A list with class "htest.ext" containing the following components:

statistic

the value of the test statistic.

p.value

the p-value of the test. Only included if no.p.value is FALSE.

method

a character string indicating what type of test was performed.

data.name

a character string giving the name of the data.

f

the 5-element vector used in calculating the test statistic.

estimate

the stochastic matrix \(\hat P\) used to derive the test statistic.

stat.desc

a brief description of the test statistic.

null

the null hypothesis (\(H_0\)) of the test.

alternative

the alternative hypothesis (\(H_1\)) of the test.

Arguments

x

either a vector containing the relative frequencies of each of the 4 nucleotides A, C, G, T, a character vector representing a DNA sequence in which each element contains a single nucleotide, or a DNA sequence stored using the SeqFastadna class from the seqinr package.

alg

the algorithm for computing the p-value. If set to “simulate”, the p-value is obtained via Monte Carlo simulation. If set to “upper”, an analytic upper bound on the p-value is computed. “upper” are based on formulae in Hart and Martínez (2011). If type is specified as “table” (the default value),the p-value for the test is obtained from a linear interpolation of a look-up table. See the note below for further details.

n

The number of replications to use for Monte Carlo simulation. If computationally feasible, a value >= 10000000 is recommended.

no.p.value

If TRUE, do not compute the p-value. The default is FALSE.

Author

Andrew Hart and Servet Martínez

Details

This function performs a test of Chargaff's second parity rule for dinucleotides based on a 4X4 stochastic matrix \(\hat P\) estimated from the empirical dinucleotide distribution of a genomic sequence . The \(a,b)\) entry of \(\hat P\) gives the empirical probability (relative frequency) that a nucleotide \(a\) is followed by a nucleotide \(b\) in the sequence. The test is set up as follows:

\(H_0\): the sequence (or matrix \(\hat P\)) does not comply with CSPR for dinucleotides
\(H_1\): the sequence (or matrix \(\hat P\)) complies with CSPR for dinucleotides

References

Hart, A.G. and Martínez, S. (2011) Statistical testing of Chargaff's second parity rule in bacterial genome sequences. Stoch. Models 27(2), 1--46.

See Also

chargaff0.test, chargaff1.test agct.test, ag.test, chargaff.gibbs.test

Examples

Run this code
#Demonstration on real bacterial sequence
data(nanoarchaeum)
chargaff2.test(nanoarchaeum)

#Simulate synthetic DNA sequence that does not satisfy Chargaff's second parity rule
trans.mat <- matrix(c(.4, .1, .4, .1, .2, .1, .6, .1, .4, .1, .3, .2, .1, .2, .4, .3), 
ncol=4, byrow=TRUE)
seq <- simulateMarkovChain(500000, trans.mat, states=c("a", "c", "g", "t"))
chargaff2.test(seq)

Run the code above in your browser using DataLab