Usage
dba(DBA,mask, minOverlap=2, sampleSheet="dba_samples.csv", config=data.frame(RunParallel=TRUE, reportInit="DBA", DataType=DBA_DATA_GRANGES, AnalysisMethod=DBA_DESEQ2, minQCth=15, fragmentSize=125, bCorPlot=FALSE, th=0.05, bUsePval=FALSE), peakCaller="raw", peakFormat, scoreCol, bLowerScoreBetter, filter, skipLines=0, bAddCallerConsensus=FALSE, bRemoveM=TRUE, bRemoveRandom=TRUE, bSummarizedExperiment=FALSE, bCorPlot, attributes)
Arguments
DBA
existing DBA object -- if present, will return a fully-constructed DBA object based on the passed one, using criteria specified in the mask and/or minOverlap parameters. If missing, will create a new DBA object based on the sampleSheet.
mask
logical or numerical vector indicating which peaksets to include in the resulting model if basing DBA object on an existing one. See dba.mask
.
minOverlap
only include peaks in at least this many peaksets in the main binding matrix if basing DBA object on an existing one. If minOverlap is between zero and one, peak will be included from at least this proportion of peaksets.
sampleSheet
data frame containing sample sheet, or file name of sample sheet to load (ignored if DBA is specified). Columns names in sample sheet may include:
- SampleID: Identifier string for sample
- Tissue: Identifier string for tissue type
- Factor: Identifier string for factor
- Condition: Identifier string for condition
- Treatment: Identifier string for treatment
- Replicate: Replicate number of sample
- bamReads: file path for bam file containing aligned reads for ChIP sample
- bamControl: file path for bam file containing aligned reads for control sample
- ControlID: Identifier string for control sample
- Peaks: path for file containing peaks for sample. format determined by PeakCaller field or caller parameter
- PeakCaller: Identifier string for peak caller used. If Peaks is not a bed file, this will determine how the Peaks file is parsed. If missing, will use default peak caller specified in caller parameter. Possible values:
- raw: text file file; peak score is in fourth column
- bed: .bed file; peak score is in fifth column
- narrow: default peak.format: narrowPeaks file
- macs: MACS .xls file
- swembl: SWEMBL .peaks file
- bayes: bayesPeak file
- peakset: peakset written out using pv.writepeakset
- fp4: FindPeaks v4
- PeakFormat: string indicating format for peak files; see PeakCaller and
dba.peakset
- ScoreCol: column in peak files that contains peak scores
- LowerBetter: logical indicating that lower scores signify better peaks
- Counts: file path for externally computed read counts; see
dba.peakset
(counts
parameter)
For sample sheets loaded from a file, the accepted formats are comma-separated values (column headers, followed by one line per sample), or Excel-formatted spreadsheets (.xls
or .xlsx
extension). Leading and trailing white space will be removed from all values, with a warning.
config
data frame containing configuration options, or file name of config file to load when constructing a new DBA object from a sample sheet. NULL indicates no config file. Relevant fields include:
- RunParallel: logical indicating if counting and analysis operations should be run in parallel using multicore by default.
- DataType: default class for peaks and reports (DBA_DATA_GRANGES, DBA_DATA_RANGEDDATA, or DBA_DATA_FRAME).
- ReportInit: string to append to the beginning of saved report file names.
- AnalysisMethod: either DBA_DESEQ2 or DBA_EDGER.
- bCorPlot: logical indicating that a correlation heatmap should be plotted automatically
- th: default threshold for reporting and plotting analysis results.
- bUsePval: logical, default indicating whether to use FDR (
FALSE
) or p-values (TRUE
).
- minQCth: numeric, for filtering reads based on mapping quality score; only reads with a mapping qulity score
gretaer than or equal to this will be counted.
- fragmentSize: numeric with mean fragment size. Reads will be extended to this length before counting overlaps. May be a vector of legnths, one for each sample.
peakCaller
if a sampleSheet is specified, the default peak caller that will be used if the PeakCaller
column is absent.
peakFormat
if a sampleSheet is specified, the default peak file format that will be used if the PeakFormat
column is absent.
scoreCol
if a sampleSheet is specified, the default column in the peak files that will be used for scoring if the ScoreCol
column is absent.
bLowerScoreBetter
if a sampleSheet is specified, the sort order for peak scores if the LowerBetter
column is absent.
filter
if a sampleSheet is specified, a filter value if the Filter
column is absent. Peaks with scores lower than this value (or higher if bLowerScoreBetter
or LowerBetter
is TRUE) will be removed.
skipLines
if a sampleSheet is specified, the number of lines (ie header lines) at the beginning of each peak file to skip.
bAddCallerConsensus
add a consensus peakset for each sample with more than one peakset (i.e. different peak callers) when constructing a new DBA object from a sample sheet.
bRemoveM
logical indicating whether to remove peaks on chrM (mitochondria) when constructing a new DBA object from a sample sheet.
bRemoveRandom
logical indicating whether to remove peaks on chrN_random when constructing a new DBA object from a sample sheet.
bCorPlot
logical indicating that a correlation heatmap should be plotted before returning. If DBA
is NULL
(a new DBA object is being created), and bCorPlot
is missing, then this will take the default value (FALSE
). However if DBA
is NULL
(a new DBA object is being created), and bCorPlot
is specified, then the specified value will become the default value of bCorPlot
for the resultant DBA
object.
attributes
vector of attributes to use subsequently as defaults when generating labels in plotting functions:
- DBA_ID
- DBA_TISSUE
- DBA_FACTOR
- DBA_CONDITION
- DBA_REPLICATE
- DBA_CONSENSUS
- DBA_CALLER
- DBA_CONTROL