Usage
MEDIPS.saturation(file=NULL, BSgenome=NULL, nit=10, nrit=1, empty_bins=TRUE, rank=FALSE, extend=0, shift=0, window_size=500, uniq=1e-3, chr.select=NULL, paired = F, bwa=FALSE)
Arguments
file
Path and file name of the IP data
BSgenome
The reference genome name as defined by BSgenome
nit
defines the number of subsets created from the full sets of available regions (default=10)
nrit
methods which randomly select data entries may be processed several times in order to obtain more stable results.
By specifying the nrit parameter (default=1) it is possible to run the saturation analysis several times.
The final results returned to the saturation results object are the averaged results of each random iteration step.
empty_bins
can be either TRUE or FALSE (default TRUE). This parameter effects the way of calculating correlations between the resulting genome vectors.
A genome vector consists of concatenated vectors for each included chromosome. The size of the vectors is defined by the bin_size parameter.
If there occur genomic bins which contain no overlapping regions, neither from the subsets of A nor from the subsets of B,
these bins will be neglected when the paramter is set to FALSE.
rank
can be either TRUE or FALSE (default FALSE). This parameter also effects the way of calculating correlations between the resulting genome vectors.
If rank is set to TRUE, the correlation will be calculated for the ranks of the windows instead of considering the counts (Spearman correlation).
Setting this parameter to TRUE is a more robust approach that reduces the effect of possible occuring outliers (these are windows with a very high number of overlapping regions) to the correlation.
extend
defines the number of bases by which the region will be extended before the genome vector is calculated.
Regions will be extended along the plus or the minus strand as defined by their provided strand information.
Please note, the extend and shift parameter are mutual exclusive.
shift
defines the number of bases by which the region will be shifted before the genome vector is calculated.
Regions will be shifted along the plus or the minus strand as defined by their provided strand information.
Please note, the extend and shift parameter are mutual exclusive.
window_size
defines the size of genome wide windows and therefore, the size of the genome vector.
uniq
The uniq parameter determines, if all reads mapping to exactly the same genomic position should be kept (uniq = 0), replaced by only one representative (uniq = 1), or if the number of stacked reads should be capped by a maximal number of stacked reads per genomic position determined by a poisson distribution of stacked reads genome wide and by a given p-value (1 > uniq > 0) (deafult: 1e-3). The smaller the p-value, the more reads at the same genomic position are potentially allowed.
chr.select
specify a subset of chromosomes for which the saturation analysis is performed.
paired
option for paired end reads
bwa
Indicates, if the alignment file has been generated by bwa (default=FALSE). Enabling bwa allows that the first mate of pairs can be the 'left' or the 'right' mate.