Learn R Programming

Genominator (version 1.26.0)

summarizeByAnnotation: Summarize data based on genome annotation.

Description

This function creates a summarization of columns of the data using specified SQLite functions, applying these summarization function to regions defined in an annotation data frame.

Usage

summarizeByAnnotation(expData, annoData, what = getColnames(expData, all = FALSE), fxs = c("TOTAL"), groupBy = NULL, splitBy = NULL, ignoreStrand = FALSE, bindAnno = FALSE, preserveColnames = TRUE, verbose = getOption("verbose"))

Arguments

expData
An object of class ExpData.
annoData
A data frame which must contain the columns chr, start, end and strand which specifies annotation regions of interest.
what
Vector of names of data columns to be summarized.
fxs
Vector of strings giving the names of SQLite functions to call on the data column(s).
groupBy
Character vector refering to a column in annoData. Regions will be aggregated over distinct values of this column. Setting this argument will set bindAnno to TRUE. If splitBy is set, meta.id will override.
splitBy
String indicating column of annoData object on which to split results.
ignoreStrand
Logical indicating whether strand should be taken into account in aggregation. If TRUE strand will be ignored.
bindAnno
Logical indicating whether annotation information should be included in the output.
preserveColnames
Logical indicating whether column names should be preserved. Only possible when a single function is being applied.
verbose
Logical indicating whether details should be printed.

Value

If splitBy is not specified, returns a data frame containing results of aggregation functions performed on each region defined in annoData. If splitBy is specified, returns a list of data frames with one entry for each unique value of the column which was split on.

Details

Most of the computation is done using SQLite. Depending on the use case, this approach may be significantly faster and use much less memory than the alternative: use splitByAnnotation to retrieve a list with all the data and then use R to summarize over each element of the list. It is (naturally) constrained to the use of operations expressible in (SQLite) SQL.

If meta.id is set to a column in annoData, all regions with the same value of the meta.id will be joined together; a standard use case is labelleing exons of a gene.

References

The SQLite website http://www.sqlite.org/lang_aggfunc.html has details on what mathematical functions are implemented.

See Also

See Genominator vignette for more information, as well as the ExpData-class.

Examples

Run this code
ed <- ExpData(system.file(package = "Genominator", "sample.db"),
              tablename = "raw")
data("yeastAnno")
summarizeByAnnotation(ed, yeastAnno[1:50,])

Run the code above in your browser using DataLab