Proportion-based filtering supposes that a certain percentage of the genome is genuinely bound.
If type="proportion"
, the filter statistic is defined as the ratio of the rank to the total number of windows.
Rank is in ascending order, i.e., higher abundance windows have higher ratios.
Windows are retained that have rank ratios above a threshold, e.g., 0.99 if 1% of the genome is assumed to be bound.All other values of type
will perform background-based filtering, where abundances of the windows are compared to those of putative background regions.
The filter statistic are generally defined as the difference between window and background abundances, i.e., the log-fold increase in the counts.
Windows can be filtered to retain those with large filter statistics, to select those that are more likely to contain genuine binding sites.
The differences between the methods center around how the background abundances are obtained for each window.
If type="global"
, the median average abundance across the genome is used as a global estimate of the background abundance.
This should be used when background
contains unfiltered counts for large (2 - 10 kbp) genomic bins, from which the background abundance can be computed.
The filter statistic for each window is defined as the difference between the window abundance and the global background.
If background
is not supplied, the background abundance is directly computed from entries in data
.
If type="local"
, the counts of each row in data
are subtracted from those of the corresponding row in background
.
The average abundance of the remaining counts is computed and used as the background abundance.
The filter statistic is defined by subtracting the background abundance from the corresponding window abundance for each row.
This is designed to be used when background
contains counts for expanded windows, to determine the local background estimate.
If type="control"
, the background abundance is defined as the average abundance of each row in background
.
The filter statistic is defined as the difference between the average abundance of each row in data
and that of the corresponding row in background
.
This is designed to be used when background
contains read counts for each window in the control sample(s).
Unlike type="local"
, there is no subtraction of the counts in background
prior to computing the average abundance.