dba: Construct a DBA object

Description

Constructs a new DBA object from a sample sheet, or based on an existing DBA object

Usage

dba(DBA,mask, minOverlap=2, sampleSheet="dba_samples.csv",  config=data.frame(RunParallel=TRUE, reportInit="DBA", DataType=DBA_DATA_GRANGES,  AnalysisMethod=DBA_DESEQ2, minQCth=15, fragmentSize=125,  bCorPlot=FALSE, th=0.05, bUsePval=FALSE), peakCaller="raw", peakFormat, scoreCol, bLowerScoreBetter,  filter, skipLines=0,  bAddCallerConsensus=FALSE,  bRemoveM=TRUE, bRemoveRandom=TRUE,  bSummarizedExperiment=FALSE, bCorPlot, attributes)

Arguments

DBA

existing DBA object -- if present, will return a fully-constructed DBA object based on the passed one, using criteria specified in the mask and/or minOverlap parameters. If missing, will create a new DBA object based on the sampleSheet.

mask

logical or numerical vector indicating which peaksets to include in the resulting model if basing DBA object on an existing one. See dba.mask.

minOverlap

only include peaks in at least this many peaksets in the main binding matrix if basing DBA object on an existing one. If minOverlap is between zero and one, peak will be included from at least this proportion of peaksets.

sampleSheet

data frame containing sample sheet, or file name of sample sheet to load (ignored if DBA is specified). Columns names in sample sheet may include:

SampleID: Identifier string for sample
Tissue: Identifier string for tissue type
Factor: Identifier string for factor
Condition: Identifier string for condition
Treatment: Identifier string for treatment
Replicate: Replicate number of sample
bamReads: file path for bam file containing aligned reads for ChIP sample
bamControl: file path for bam file containing aligned reads for control sample
ControlID: Identifier string for control sample
Peaks: path for file containing peaks for sample. format determined by PeakCaller field or caller parameter
PeakCaller: Identifier string for peak caller used. If Peaks is not a bed file, this will determine how the Peaks file is parsed. If missing, will use default peak caller specified in caller parameter. Possible values:
- “raw”: text file file; peak score is in fourth column
- “bed”: .bed file; peak score is in fifth column
- “narrow”: default peak.format: narrowPeaks file
- “macs”: MACS .xls file
- “swembl”: SWEMBL .peaks file
- “bayes”: bayesPeak file
- “peakset”: peakset written out using pv.writepeakset
- “fp4”: FindPeaks v4
PeakFormat: string indicating format for peak files; see PeakCaller and dba.peakset
ScoreCol: column in peak files that contains peak scores
LowerBetter: logical indicating that lower scores signify better peaks
Counts: file path for externally computed read counts; see dba.peakset (counts parameter)

For sample sheets loaded from a file, the accepted formats are comma-separated values (column headers, followed by one line per sample), or Excel-formatted spreadsheets (.xls or .xlsx extension). Leading and trailing white space will be removed from all values, with a warning.

config

data frame containing configuration options, or file name of config file to load when constructing a new DBA object from a sample sheet. NULL indicates no config file. Relevant fields include:

RunParallel: logical indicating if counting and analysis operations should be run in parallel using multicore by default.
DataType: default class for peaks and reports (DBA_DATA_GRANGES, DBA_DATA_RANGEDDATA, or DBA_DATA_FRAME).
ReportInit: string to append to the beginning of saved report file names.
AnalysisMethod: either DBA_DESEQ2 or DBA_EDGER.
bCorPlot: logical indicating that a correlation heatmap should be plotted automatically
th: default threshold for reporting and plotting analysis results.
bUsePval: logical, default indicating whether to use FDR (FALSE) or p-values (TRUE).
minQCth: numeric, for filtering reads based on mapping quality score; only reads with a mapping qulity score gretaer than or equal to this will be counted.
fragmentSize: numeric with mean fragment size. Reads will be extended to this length before counting overlaps. May be a vector of legnths, one for each sample.

peakCaller

if a sampleSheet is specified, the default peak caller that will be used if the PeakCaller column is absent.

peakFormat

if a sampleSheet is specified, the default peak file format that will be used if the PeakFormat column is absent.

scoreCol

if a sampleSheet is specified, the default column in the peak files that will be used for scoring if the ScoreCol column is absent.

bLowerScoreBetter

if a sampleSheet is specified, the sort order for peak scores if the LowerBetter column is absent.

filter

if a sampleSheet is specified, a filter value if the Filter column is absent. Peaks with scores lower than this value (or higher if bLowerScoreBetter or LowerBetter is TRUE) will be removed.

skipLines

if a sampleSheet is specified, the number of lines (ie header lines) at the beginning of each peak file to skip.

bAddCallerConsensus

add a consensus peakset for each sample with more than one peakset (i.e. different peak callers) when constructing a new DBA object from a sample sheet.

bRemoveM

logical indicating whether to remove peaks on chrM (mitochondria) when constructing a new DBA object from a sample sheet.

bRemoveRandom

logical indicating whether to remove peaks on chrN_random when constructing a new DBA object from a sample sheet.

bSummarizedExperiment

logical indicating whether to return resulting object as a SummarizedExperiment.

bCorPlot

logical indicating that a correlation heatmap should be plotted before returning. If DBA is NULL (a new DBA object is being created), and bCorPlot is missing, then this will take the default value (FALSE). However if DBA is NULL (a new DBA object is being created), and bCorPlot is specified, then the specified value will become the default value of bCorPlot for the resultant DBA object.

attributes

vector of attributes to use subsequently as defaults when generating labels in plotting functions:

DBA_ID
DBA_TISSUE
DBA_FACTOR
DBA_CONDITION
DBA_REPLICATE
DBA_CONSENSUS
DBA_CALLER
DBA_CONTROL

Value

DBA object

Details

MODE: Construct a new DBA object from a samplesheet:

dba(sampleSheet, config, bAddCallerConsensus, bRemoveM, bRemoveRandom, attributes)

MODE: Construct a DBA object based on an existing one:

dba(DBA, mask, attributes)

MODE: Convert a DBA object to a SummarizedExperiment object:

dba(DBA, bSummarizedExperiment=TRUE)

Examples

Run this code

# Create DBA object from a samplesheet
setwd(system.file("extra", package="DiffBind"))
tamoxifen <- dba(sampleSheet="tamoxifen.csv")
tamoxifen

tamoxifen <- dba(sampleSheet="tamoxifen_allfields.csv")
tamoxifen

tamoxifen <- dba(sampleSheet="tamoxifen_allfields.csv",config="config.csv")
tamoxifen

#Create a DBA object with a subset of samples
data(tamoxifen_peaks)
Responsive <- dba(tamoxifen,tamoxifen$masks$Responsive)
Responsive

# change peak caller but leave peak format the same
setwd(system.file("extra", package="DiffBind"))
tamoxifen <- dba(sampleSheet="tamoxifen.csv", peakCaller="macs", 
                 peakFormat="raw", scoreCol=5 )
dba.show(tamoxifen, attributes=c(DBA_TISSUE,DBA_CONDITION,DBA_REPLICATE,DBA_CALLER))

# Convert DBA object to SummarizedExperiment
data(tamoxifen_counts)
sset <- dba(tamoxifen,bSummarizedExperiment=TRUE)
sset