regionGoodnessOfFit-methods: Calculate goodness-of-fit statistics

Description

A generic method for calculating chi-squared goodness-of-fit statistics (See details). Dispatches on either a data.frame or and ExpData object.

Usage

"regionGoodnessOfFit"(obj, denominator = colSums(obj), groups = rep("A", ncol(obj)))
"regionGoodnessOfFit"(obj, annoData, groups = rep("A", length(what)), what = getColnames(obj, all = FALSE), denominator = c("regions", "lanes"),  verbose = getOption("verbose"))

Arguments

obj

data.frame or ExpData

annoData

A data.frame of annotation.

groups

A factor or character vector describing which are the replicates.

denominator

How to scale the columns to take into account sequencing depth.

what

Which columns to choose from the database. Default is all data columns.

verbose

Whether or not debugging / timing info should be printed.

Value

An list containing the statistics and degrees of freedom. See details. Technically, an S3 object with class genominator.goodness.of.fit

Methods

signature(obj = "ExpData"): Here obj represents the results of a call to summarizeByAnnotation or a data.frame with columns representing samples and rows representing regions, i.e. genes. Denominator is how we scale each column, therefore it this must be true: length(denominator) == ncol(obj). Finally, groups determines how columns are aggregated across one another, i.e. which columns are replicates.
signature(obj = "data.frame"): Here annoData is an annotation data frame. groups is as above. what represents the columns to select choose. denominator is either the total lane counts, or the lane counts restricted to annoData, or a vector of length length(groups)

Details

This function implements the homogenous Poisson model across lanes as described in the article cited below. This model corresponds to common expression parameter across lanes scaled by a lane-specific offset. Goodness of fit to this model across replicates is a good indication of Poisson variation across lanes. Deviation from this is an indication of overdispersion between replicate lanes.

References

James H. Bullard, Elizabeth A. Purdom, Kasper D. Hansen, Steffen Durinck, and Sandrine Dudoit, "Statistical Inference in mRNA-Seq: Exploratory Data Analysis and Differential Expression" (April 2009). U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 247. http://www.bepress.com/ucbbiostat/paper247

Examples

Run this code

ed <- ExpData(system.file(package = "Genominator", "sample.db"),
              tablename = "raw")
data("yeastAnno")
names(regionGoodnessOfFit(ed, yeastAnno))

Run the code above in your browser using DataLab