Biostrings (version 2.40.2)

dinucleotideFrequencyTest: Pearson's chi-squared Test and G-tests for String Position Dependence

Description

Performs Person's chi-squared test, G-test, or William's corrected G-test to determine dependence between two nucleotide positions.

Usage

dinucleotideFrequencyTest(x, i, j, test = c("chisq", "G", "adjG"), simulate.p.value = FALSE, B = 2000)

Arguments

x
A DNAStringSet or RNAStringSet object.
i, j
Single integer values for positions to test for dependence.
test
One of "chisq" (Person's chi-squared test), "G" (G-test), or "adjG" (William's corrected G-test). See Details section.
simulate.p.value
a logical indicating whether to compute p-values by Monte Carlo simulation.
B
an integer specifying the number of replicates used in the Monte Carlo test.

Value

An htest object. See help(chisq.test) for more details.

Details

The null and alternative hypotheses for this function are:
H0:
positions i and j are independent

H1:
otherwise

Let O and E be the observed and expected probabilities for base pair combinations at positions i and j respectively. Then the test statistics are calculated as:

test="chisq":
stat = sum(abs(O - E)^2/E)

test="G":
stat = 2 * sum(O * log(O/E))

test="adjG":
stat = 2 * sum(O * log(O/E))/q, where q = 1 + ((df - 1)^2 - 1)/(6*length(x)*(df - 2))

Under the null hypothesis, these test statistics are approximately distributed chi-squared(df = ((distinct bases at i) - 1) * ((distinct bases at j) - 1)).

References

Ellrott, K., Yang, C., Sladek, F.M., Jiang, T. (2002) "Identifying transcription factor binding sites through Markov chain optimations", Bioinformatics, 18 (Suppl. 2), S100-S109.

Sokal, R.R., Rohlf, F.J. (2003) "Biometry: The Principle and Practice of Statistics in Biological Research", W.H. Freeman and Company, New York.

Tomovic, A., Oakeley, E. (2007) "Position dependencies in transcription factor binding sites", Bioinformatics, 23, 933-941.

Williams, D.A. (1976) "Improved Likelihood ratio tests for complete contingency tables", Biometrika, 63, 33-37.

See Also

nucleotideFrequencyAt, XStringSet-class, chisq.test

Examples

Run this code
  data(HNF4alpha)
  dinucleotideFrequencyTest(HNF4alpha, 1, 2)
  dinucleotideFrequencyTest(HNF4alpha, 1, 2, test = "G")
  dinucleotideFrequencyTest(HNF4alpha, 1, 2, test = "adjG")

Run the code above in your browser using DataCamp Workspace