Usage
GUIDEseqAnalysis(alignment.inputfile, umi.inputfile, alignment.format = c("auto", "bam", "bed"), umi.header = FALSE, read.ID.col = 1, umi.col = 2, umi.sep = "\t", BSgenomeName, gRNA.file, outputDir, n.cores.max = 6, keep.R1only = TRUE, keep.R2only = TRUE, concordant.strand = TRUE, max.paired.distance = 1000, min.mapping.quality = 30, max.R1.len = 130, max.R2.len = 130, apply.both.max.len = FALSE, same.chromosome = TRUE, distance.inter.chrom = -1, min.R1.mapped = 20, min.R2.mapped = 20, apply.both.min.mapped = FALSE, max.duplicate.distance = 0, umi.plus.R1start.unique = TRUE, umi.plus.R2start.unique = TRUE, window.size = 20L, step = 20L, bg.window.size = 5000L, min.reads = 5L, min.reads.per.lib = 1L, min.SNratio = 2, maxP = 0.05, stats = c("poisson", "nbinom"), p.adjust.methods = c( "none", "BH", "holm", "hochberg", "hommel", "bonferroni", "BY", "fdr"), distance.threshold = 40L, max.overlap.plusSig.minusSig = 10L, plus.strand.start.gt.minus.strand.end = TRUE, gRNA.format = "fasta", overlap.gRNA.positions = c(17,18), upstream = 50, downstream = 50, PAM.size = 3, gRNA.size = 20, PAM = "NGG", PAM.pattern = "(NAG|NGG|NGA)$", max.mismatch = 6, allowed.mismatch.PAM = 2, overwrite = TRUE, weights = c(0, 0, 0.014, 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508, 0.613, 0.851, 0.732, 0.828, 0.615,0.804, 0.685, 0.583), orderOfftargetsBy = c("predicted_cleavage_score", "n.mismatch"), descending = c(TRUE, FALSE), keepTopOfftargetsOnly = TRUE)
Arguments
alignment.inputfile
The alignment file. Currently supports bam and bed output file with CIGAR information.
Suggest run the workflow binReads.sh, which sequentially runs barcode
binning, adaptor removal, alignment to genome, alignment quality filtering,
and bed file conversion. Please download the workflow function and its
helper scripts at http://mccb.umassmed.edu/GUIDE-seq/binReads/
umi.inputfile
A text file containing at least two columns, one is the read identifier and
the other is the UMI or UMI plus the first few bases of R1 reads. Suggest
use getUMI.sh to generate this file. Please download the script and its
helper scripts at http://mccb.umassmed.edu/GUIDE-seq/getUMI/
alignment.format
The format of the alignment input file. Default bed file format. Currently only
bed file format is supported, which is generated from binReads.sh
umi.header
Indicates whether the umi input file contains a header line or not. Default to
FALSE
read.ID.col
The index of the column containing the read identifier in the umi input file,
default to 1
umi.col
The index of the column containing the umi or umi plus the first few bases of
sequence from the R1 reads, default to 2
umi.sep
column separator in the umi input file, default to tab
BSgenomeName
BSgenome object. Please refer to available.genomes in BSgenome package. For
example, BSgenome.Hsapiens.UCSC.hg19 for hg19,
BSgenome.Mmusculus.UCSC.mm10 for mm10,
BSgenome.Celegans.UCSC.ce6 for ce6,
BSgenome.Rnorvegicus.UCSC.rn5 for rn5,
BSgenome.Drerio.UCSC.danRer7 for Zv9, and
BSgenome.Dmelanogaster.UCSC.dm3 for dm3
gRNA.file
gRNA input file path or a DNAStringSet object that contains gRNA plus PAM
sequences used for genome editing
outputDir
the directory where the off target analysis and reports will be written to
n.cores.max
Indicating maximum number of cores to use in multi core mode,
i.e., parallel processing, default 6. Please set it to 1 to disable
multicore processing for small dataset.
keep.R1only
Specify whether to include alignment with only R1 without paired R2.
Default TRUE
keep.R2only
Specify whether to include alignment with only R2 without paired R1.
Default TRUE
concordant.strand
Specify whether the R1 and R2 should be aligned to the same strand or opposite
strand. Default opposite.strand (TRUE)
max.paired.distance
Specify the maximum distance allowed between paired R1 and R2 reads.
Default 1000 bp
min.mapping.quality
Specify min.mapping.quality of acceptable alignments
max.R1.len
The maximum retained R1 length to be considered for downstream analysis,
default 130 bp. Please note that default of 130 works well when the read
length 150 bp. Please also note that retained R1 length is not necessarily
equal to the mapped R1 length
max.R2.len
The maximum retained R2 length to be considered for downstream analysis,
default 130 bp. Please note that default of 130 works well when the read
length 150 bp. Please also note that retained R2 length is not necessarily
equal to the mapped R2 length
apply.both.max.len
Specify whether to apply maximum length requirement to both R1 and R2 reads,
default FALSE
same.chromosome
Specify whether the paired reads are required to align to the same chromosome,
default TRUE
distance.inter.chrom
Specify the distance value to assign to the paired reads that are aligned to
different chromosome, default -1
min.R1.mapped
The maximum mapped R1 length to be considered for downstream analysis,
default 30 bp.
min.R2.mapped
The maximum mapped R2 length to be considered for downstream analysis,
default 30 bp.
apply.both.min.mapped
Specify whether to apply minimum mapped length requirement to both R1 and R2
reads, default FALSE
max.duplicate.distance
Specify the maximum distance apart for two reads to be considered as
duplicates, default 0. Currently only 0 is supported
umi.plus.R1start.unique
To specify whether two mapped reads are considered as unique if both
containing the same UMI and same alignment start for R1 read, default TRUE.
umi.plus.R2start.unique
To specify whether two mapped reads are considered as unique if both
containing the same UMI and same alignment start for R2 read, default TRUE.
window.size
window size to calculate coverage
step
step size to calculate coverage
bg.window.size
window size to calculate local background
min.reads
minimum number of reads to be considered as a peak
min.reads.per.lib
minimum number of reads in each library (usually two libraries)
to be considered as a peak
min.SNratio
minimum signal noise ratio, which is the coverage normalized by local
background
maxP
Maximum p-value to be considered as significant
stats
Statistical test, default poisson
p.adjust.methods
Adjustment method for multiple comparisons, default none
distance.threshold
Specify the maximum gap allowed between the plus strand and
the negative strand peak, default 40. Suggest set it to twice of
window.size used for peak calling.
max.overlap.plusSig.minusSig
Specify the maximum overlap (cushion distance) between plus strand peak and minus strand peak.
Default to 10L to allow sequence error and inprecise integration. Only applicable
if plus.strand.start.gt.minus.strand.end is set to TRUE.
plus.strand.start.gt.minus.strand.end
Specify whether plus strand peak start greater than
the paired negative strand peak end. Default to TRUE
gRNA.format
Format of the gRNA input file. Currently, fasta is supported
PAM.size
PAM length, default 3
gRNA.size
The size of the gRNA, default 20
PAM
PAM sequence after the gRNA, default NGG
overlap.gRNA.positions
The required overlap positions of gRNA and restriction enzyme cut site,
default 17 and 18 for SpCas9.
max.mismatch
Maximum mismatch allowed in off target search, default 6
PAM.pattern
Regular expression of protospacer-adjacent motif (PAM), default
(NAG|NGG|NGA)$ for off target search
allowed.mismatch.PAM
Number of degenerative bases in the PAM sequence, default to 2 for N[A|G]G PAM
upstream
upstream offset from the peak start to search for off targets, default 50
downstream
downstream offset from the peak end to search for off targets, default 50
overwrite
overwrite the existing files in the output directory or not, default FALSE
weights
a numeric vector size of gRNA length, default c(0, 0, 0.014, 0, 0, 0.395,
0.317, 0, 0.389, 0.079, 0.445, 0.508, 0.613, 0.851, 0.732, 0.828, 0.615,
0.804, 0.685, 0.583) for SPcas9 system, which is used in Hsu et al., 2013
cited in the reference section. Please make sure that the number of
elements in this vector is the same as the gRNA.size, e.g., pad 0s at the
beginning of the vector.
orderOfftargetsBy
criteria to order the offtargets by. By default, order by
predicted_cleavage_score (descending order)
followed by n.mismatch (ascending order)
User can change the order of these two criteria and
change descending order accordingly
descending
In the descending or ascending order. Default to order by predicted cleavage score
in descending order and number of mismatch in ascending order
When altering orderOfftargetsBy order, please also modify descending accordingly
keepTopOfftargetsOnly
Output all offtargets or the top offtarget using the orderOfftargetsBy criteria,
default to the top offtarget