getCTSS(object, sequencingQualityThreshold = 10, mappingQualityThreshold = 20, removeFirstG = TRUE, correctSystematicG = TRUE)
CAGEset
object
>= sequencingQualityThreshold
and mapping quality >= mappingQualityThreshold
are kept. Used only if inputFileType(object) == "bam"
or inputFileType(object) == "bamPairedEnd"
, i.e when input files are BAM files of aligned sequenced CAGE tags, otherwise ignored. If there are no sequencing quality values in the BAM file (e.g. HeliScope single molecule sequencer does not return sequencing qualities) all reads will by default have this value set to -1. Since the default value of sequencingQualityThreshold
is 10, all the reads will consequently be discarded. To avoid this behaviour and keep all sequenced reads set sequencingQualityThreshold
to -1 when processing data without sequencing qualities. If there is no information on mapping quality in the BAM file (e.g. software used to align CAGE tags to the referent genome does not provide mapping quality) the mappingQualityThreshold
parameter is ignored. In case of paired-end sequencing BAM file (i.e. inputFileType(object) == "bamPairedEnd"
) only the first mate of the properly paired reads (i.e. the five prime end read) will be read and subject to specified thresholds.
inputFileType(object) == "bam"
or inputFileType(object) == "bamPairedEnd"
, i.e when input files are BAM files of aligned sequenced CAGE tags, otherwise ignored. See Details.
removeFirstG = TRUE
, otherwise it is ignored. The frequency of adding a G to CAGE tags is estimated from mismatch cases and used to systematically correct the G addition for positions with G in the genome. Used only if inputFileType(object) == "bam"
or inputFileType(object) == "bamPairedEnd"
, i.e when input files are BAM files of aligned sequenced CAGE tags, otherwise ignored. See Details.
librarySizes
, CTSScoordinates
and tagCountMatrix
of the provided CAGEset
object will be occupied by the information on CTSSs created from input CAGE files.
removeFirstG = TRUE
is highly recommended.However, when there is a G both at the beginning of the CAGE tag and in the genome, it is not clear whether the original CAGE tag really starts at this position or the G nucleotide was added later in the experimental protocol. To systematically correct CAGE tags mapping at such positions, a general frequency of adding a G to CAGE tags can be calculated from mismatch cases and applied to estimate the number of CAGE tags that have G added and should actually start at the next nucleotide/position. The option correctSystematicG
is an implementation of the correction algorithm described in Carninci et al., Nature Genetics 2006, Supplementary Information section 3-e.
Carninci et al. (2006) Genome-wide analysis of mammalian promoter architecture and evolution, Nature Genetics 38(7):626-635.
CTSScoordinates
library(BSgenome.Drerio.UCSC.danRer7)
pathsToInputFiles <- system.file("extdata", c("Zf.unfertilized.egg.chr17.ctss",
"Zf.30p.dome.chr17.ctss", "Zf.prim6.rep1.chr17.ctss"), package="CAGEr")
labels <- paste("sample", seq(1,3,1), sep = "")
myCAGEset <- new("CAGEset", genomeName = "BSgenome.Drerio.UCSC.danRer7",
inputFiles = pathsToInputFiles, inputFilesType = "ctss", sampleLabels = labels)
getCTSS(myCAGEset)
Run the code above in your browser using DataLab