Usage
compare2Sequences(inputFile1Path, inputFile2Path, inputNames=c("Seq1", "Seq2"), format = c("fasta", "fasta"), header=FALSE, findgRNAsWithREcutOnly = FALSE, searchDirection=c("both","1to2", "2to1"), BSgenomeName, REpatternFile=system.file("extdata", "NEBenzymes.fa", package = "CRISPRseek"), minREpatternSize = 6, findgRNAs = c(TRUE, TRUE), removegRNADetails = c(FALSE, FALSE), exportAllgRNAs = c("no", "all", "fasta", "genbank"), annotatePaired = FALSE, overlap.gRNA.positions = c(17, 18), findPairedgRNAOnly = FALSE, min.gap = 0, max.gap = 20, gRNA.name.prefix = "gRNA", PAM.size = 3, gRNA.size = 20, PAM = "NGG", PAM.pattern = "N[A|G]G$", allowed.mismatch.PAM = 2, max.mismatch = 3, outputDir, upstream = 0, downstream = 0, weights = c(0, 0, 0.014, 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508, 0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583), overwrite = FALSE, baseBeforegRNA = 4, baseAfterPAM = 3, featureWeightMatrixFile = system.file("extdata", "DoenchNBT2014.csv", package = "CRISPRseek"), foldgRNAs = FALSE, gRNA.backbone="GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUU", temperature = 37, scoring.method = c("Hsu-Zhang", "CFDscore"), subPAM.activity = hash( AA =0, AC = 0, AG = 0.259259259, AT = 0, CA = 0, CC = 0, CG = 0.107142857, CT = 0, GA = 0.069444444, GC = 0.022222222, GG = 1, GT = 0.016129032, TA = 0, TC = 0, TG = 0.038961039, TT = 0), subPAM.position = c(22, 23), mismatch.activity.file = system.file("extdata", "NatureBiot2016SuppTable19DoenchRoot.csv", package = "CRISPRseek") )
Arguments
inputFile1Path
Sequence input file 1 path that contains one of the
two sequences to be searched for potential gRNAs
inputFile2Path
Sequence input file 2 path that contains one of the
two sequences to be searched for potential gRNAs
inputNames
Name of the input sequences when inputFile1Path and inputFile2Path are
DNAStringSet instead of file path
format
Format of the input files, fasta, fastq and bed format are supported, default fasta
header
Indicate whether the input file contains header, default FALSE, only applies to bed format
findgRNAsWithREcutOnly
Indicate whether to find gRNAs overlap with restriction enzyme recognition
pattern
searchDirection
Indicate whether perfrom gRNA in both sequences and off-target search against each other (both)
or search gRNA in input1 and off-target analysis in input2 (1to2), or vice versa (2to1)
BSgenomeName
BSgenome object. Please refer to available.genomes in BSgenome package. For
example, BSgenome.Hsapiens.UCSC.hg19 for hg19, BSgenome.Mmusculus.UCSC.mm10
for mm10, BSgenome.Celegans.UCSC.ce6 for ce6, BSgenome.Rnorvegicus.UCSC.rn5
for rn5, BSgenome.Drerio.UCSC.danRer7 for Zv9, and
BSgenome.Dmelanogaster.UCSC.dm3 for dm3
REpatternFile
File path containing restriction enzyme cut patters
minREpatternSize
Minimum restriction enzyme recognition pattern length required for the enzyme
pattern to be searched for, default 6
findgRNAs
Indicate whether to find gRNAs from the sequences in the input file or skip
the step of finding gRNAs, default TRUE for both input sequences.
Set it to FALSE if the input file contains user selected gRNAs plus PAM already.
removegRNADetails
Indicate whether to remove the detailed gRNA information such as efficacy file and
restriction enzyme cut sites, default false for both input sequences.
Set it to TRUE if the input file contains the user selected gRNAs plus PAM already.
exportAllgRNAs
Indicate whether to output all potential gRNAs to a file in fasta format,
genbank format or both. Default to no.
annotatePaired
Indicate whether to output paired information, default to FALSE
overlap.gRNA.positions
The required overlap positions of gRNA and restriction enzyme cut site,
default 17 and 18
findPairedgRNAOnly
Choose whether to only search for paired gRNAs in such an orientation that
the first one is on minus strand called reverse gRNA and the second one is
on plus strand called forward gRNA. TRUE or FALSE, default FALSE
min.gap
Minimum distance between two oppositely oriented gRNAs to be valid paired
gRNAs. Default 0
max.gap
Maximum distance between two oppositely oriented gRNAs to be valid paired
gRNAs. Default 20
gRNA.name.prefix
The prefix used when assign name to found gRNAs, default gRNA, short for
guided RNA.
PAM.size
PAM length, default 3
gRNA.size
The size of the gRNA, default 20
PAM
PAM sequence after the gRNA, default NGG
PAM.pattern
Regular expression of PAM, default N[A|G]G$
allowed.mismatch.PAM
Number of degenerative bases in the PAM sequence, default to 2 for N[A|G]G PAM
max.mismatch
Maximum mismatch allowed to search the off targets in the other sequence,
default 3
outputDir
the directory where the sequence comparison results will be written to
upstream
upstream offset from the bed input starts to search for gRNA and/or offtargets, default 0
downstream
downstream offset from the bed input ends to search for gRNA and/or offtargets, default 0
weights
numeric vector size of gRNA length, default c(0, 0, 0.014, 0, 0, 0.395,
0.317, 0, 0.389, 0.079, 0.445, 0.508, 0.613, 0.851, 0.732, 0.828, 0.615,
0.804, 0.685, 0.583) which is used in Hsu et al., 2013 cited in the reference
section
overwrite
overwrite the existing files in the output directory or not, default TRUE
baseBeforegRNA
Number of bases before gRNA used for calculating gRNA efficiency, default 4
baseAfterPAM
Number of bases after PAM used for calculating gRNA efficiency, default 3
featureWeightMatrixFile
Feature weight matrix file used for calculating gRNA efficiency. By default DoenchNBT2014 weight matrix is used. To use alternative weight matrix file, please input a csv file with first column containing significant features and the second column containing the corresponding weights for the features. Please see Doench et al., 2014 for details.
foldgRNAs
Default FALSE. If set to TRUE, summary file will contain minimum free energy of the secondary
structure of gRNA with gRNA backbone from GeneRfold package provided that GeneRfold package has been installed.
gRNA.backbone
gRNA backbone constant region sequence. Default to the
sequence in Sp gRNA backbone.
temperature
temperature in celsius. Default to 37 celsius.
scoring.method
Indicates which method to use for offtarget cleavage rate estimation, currently
two methods are supported, Hsu-Zhang and CFDscore
subPAM.activity
Applicable only when scoring.method is set to CFDscore
A hash to represent the cleavage rate for each alternative sub PAM sequence relative
to preferred PAM sequence
subPAM.position
Applicable only when scoring.method is set to CFDscore
The start and end positions of the sub PAM. Default to 22 and 23 for SP with 20bp gRNA and NGG as preferred PAM
mismatch.activity.file
Applicable only when scoring.method is set to CFDscore
A comma separated (csv) file containing the cleavage rates for all possible types of
single nucleotide mismatche at each position of the gRNA. By default, using the supplemental Table 19 from Doench et al., Nature Biotechnology 2016