GenoGAMDataSet: GenoGAMDataSet constructor.

Description

This is the constructor function for GenoGAMDataSet. So far a GenoGAMDataSet can be constructed from either an experiment design file or data.frame or directly from a RangedSummarizedExperiment with a GPos object being the rowRanges.

Usage

GenoGAMDataSet(experimentDesign, chunkSize, overhangSize, design, directory = ".", ...)

Arguments

experimentDesign

Either a character object specifying the path to a delimited text file (the delimiter will be determined automatically), or a data.frame specifying the experiment design. See details for the structure of the experimentDesign.

chunkSize

An integer specifying the size of one chunk in bp.

overhangSize

An integer specifying the size of the overhang in bp. As the overhang is taken to be symmetrical, only the overhang of one side should be provided.

design

A mgcv-like formula object. See details for its structure.

Value

An object of class GenoGAMDataSet.

Details

The experimentDesign file/data.frame must contain at least three columns with fixed names: 'ID', 'file' and 'paired'.The field 'ID' stores a unique identifier for each alignment file. It is recommended to use short and easy to understand identifiers because they are subsequently used for labelling data and plots. The field 'file' stores the BAM file name. The field 'paired', values TRUE for paired-end sequencing data, and FALSE for single-end sequencing data. All other columns are stored in the colData slot of the GenoGAMDataSet object. Note that all columns which will be used for analysis must have at most two conditions, which are for now restricted to 0 and 1. For example, if the IP data schould be corrected for input, then the input will be 0 and IP will be 1, since we are interested in the corrected IP. See examples.

Design must be a mgcv-like formula. At the moment only the following is possible: Either '~ 1' for a constant. ~ s(x) for a smooth fit over the entire data. s(x, by = "myColumn"), where 'myColumn' is a column name in the experimentDesign. This type of formula will then only fit the samples annotated with 1 in this column. Or ~ s(x) + s(x, by = "myColumn") + s(x, by = ...) + ... The last formula lets you combine any number of columns, given they are binary with 0 and 1. For example the formula for correcting IP for input would look like this: ~ s(x) + s(x, by = "experiment"), where 'experiment' is a column with 0s and 1s, with the ip samples annotated with 1 and input samples with 0. ' In case of single-end data in might be usefull to specify a different method for fragment size estimation. The argument 'shiftMethod' can be supplied with the values 'coverage' (default), 'correlation' or 'SISSR'. See ?chipseq::estimate.mean.fraglen for explanation.

Examples

Run this code

## Not run: 
# myConfig <- data.frame(ID = c("input","ip"),
#                   file = c("myInput.bam", "myIP.bam"),
#                   paired = c(FALSE, FALSE),
#                   experiment = factor(c(0,1)),
#                   stringsAsFactors = FALSE) 
# myConfig2 <- data.frame(ID = c("wildtype1","wildtype2",
#                               "mutant1", "mutant2"),
#                   file = c("myWT1.bam", "myWT2.bam"
#                            "myMutant1.bam", "myMutant2.bam"),
#                   paired = c(FALSE, FALSE, FALSE, FALSE),
#                   experiment = factor(c(0, 0, 1, 1)),
#                   stringsAsFactors = FALSE)
# 
# gtiles <- GenoGAMDataSet(myConfig, chunkSize = 2000,
# overhang = 250, design = ~ s(x) + s(x, by = "experiment")
# gtiles <- GenoGAMDataSet(myConfig2, chunkSize = 2000,
# overhang = 250, design = ~ s(x) + s(x, by = "experiment"))
# ## End(Not run)

Run the code above in your browser using DataLab