Learn R Programming

spgs (version 1.0-1)

chargaff0.test: Vector Test of Chargaff's Second Parity Rule (CSPR) for Mononucleotides

Description

Performs the vector test of Chargaff's second parity rule (CSPR) for mononucleotides proposed in Hart and Mart<ed>nez (2001).

Usage

chargaff0.test(x, alg=c("exact", "simulate", "lower", "upper", "Lower", "Upper"), n, 
no.p.value=FALSE)

Arguments

x

either a vector containing the relative frequencies of each of the 4 nucleotides A, C, G, T, a character vector representing a DNA sequence in which each element contains a single nucleotide, or a DNA sequence stored using the SeqFastadna class from the seqinr package.

alg

the algorithm for computing the p-value. If set to “simulate”, the p-value is obtained via Monte Carlo simulation. If set to “lower”, an analytic lower bound on the p-value is computed. If set to “upper”, an analytic upper bound on the p-value is computed. “lower” and “upper” are based on formulae in Hart and Mart<ed>nez (2011). a Tighter (though unpublished) lower /upper bound on the p-value may be obtained by specifying “Lower”/“Upper”. If type is specified as “exact” (the default value),the p-value for the test is computed exactly for small values of the test statistic and crudely approximated for large values. See the note below for further details.

n

The number of replications to use for Monte Carlo simulation. If computationally feasible, a value >= 10000000 is recommended.

no.p.value

If TRUE, do not compute the p-value. The default is FALSE.

Value

A list with class "htest.ext" containing the following components:

statistic

the value of the test statistic.

p.value

the p-value of the test. Only included if no.p.value is FALSE.

method

a character string indicating what type of test was performed.

data.name

a character string giving the name of the data.

estimate

the probability vector used to derive the test statistic.

stat.desc

a brief description of the test statistic.

null

the null hypothesis (\(H_0\)) of the test.

alternative

the alternative hypothesis (\(H_1\)) of the test.

Details

The first argument may be a character vector representing a DNA sequence, a DNA sequence represented using the SeqFastadna class from the seqinr package, or a vector containing the relative frequencies of the A, C, G and T nucleic acids.

Letting A, C, G and T denote the relative frequencies of their corresponding nucleic acids, this function performs the following hypothesis test:

\(H_0\): \(A\neq T\) or \(C\neq G\) \(H_1\): \(A=T\) and \(C=G\)

The vector \((A,C,G,T)\) is assumed to come from a Dirichlet(1,1,1,1) distribution on the 3-simplex under the null hypothesis.

The test statistic is \(\eta_0 = \sqrt{(A-T)^2/2+(C-G)^2/2}\).

References

Hart, A.G. and Mart<ed>nez, S. (2011) Statistical testing of Chargaff's second parity rule in bacterial genome sequences. Stoch. Models 27(2), 1--46.

See Also

chargaff1.test, chargaff2.test, agct.test, ag.test, chargaff.gibbs.test

Examples

Run this code
# NOT RUN {
#Demonstration on real bacterial sequence
data(nanoarchaeum)
chargaff0.test(nanoarchaeum)

#Simulate synthetic DNA sequence that does not satisfy Chargaff's second parity rule
trans.mat <- matrix(c(.4, .1, .4, .1, .2, .1, .6, .1, .4, .1, .3, .2, .1, .2, .4, .3), 
ncol=4, byrow=TRUE)
seq <- simulateMarkovChain(500000, trans.mat, states=c("a", "c", "g", "t"))
chargaff0.test(seq)
# }

Run the code above in your browser using DataLab