CRISPRseek (version 1.12.0)

offTargetAnalysis: Design of target-specific guide RNAs for CRISPR-Cas9 system in one function

Description

Design of target-specific guide RNAs (gRNAs) for CRISPR-Cas9 system by automatically calling findgRNAs, filtergRNAs, searchHits, buildFeatureVectorForScoring, getOfftargetScore, filterOfftarget, calculating gRNA cleavage efficiency and generate reports.

Usage

offTargetAnalysis(inputFilePath, format = "fasta", header = FALSE, gRNAoutputName, findgRNAs = TRUE, exportAllgRNAs = c("all", "fasta", "genbank", "no"), findgRNAsWithREcutOnly = FALSE, REpatternFile = system.file("extdata", "NEBenzymes.fa", package = "CRISPRseek"), minREpatternSize = 4, overlap.gRNA.positions = c(17, 18), findPairedgRNAOnly = FALSE, annotatePaired = TRUE, enable.multicore = FALSE, n.cores.max, min.gap = 0, max.gap = 20, gRNA.name.prefix = "", PAM.size = 3, gRNA.size = 20, PAM = "NGG", BSgenomeName, chromToSearch = "all", chromToExclude = c("chr17_ctg5_hap1","chr4_ctg9_hap1", "chr6_apd_hap1", "chr6_cox_hap2", "chr6_dbb_hap3", "chr6_mann_hap4", "chr6_mcf_hap5","chr6_qbl_hap6", "chr6_ssto_hap7"), max.mismatch = 3, PAM.pattern = "N[A|G]G$", allowed.mismatch.PAM = 2, gRNA.pattern = "", min.score = 0, topN = 1000, topN.OfftargetTotalScore = 10, annotateExon = TRUE, txdb, orgAnn, outputDir, fetchSequence = TRUE, upstream = 200, downstream = 200, upstream.search = 0, downstream.search = 0, weights = c(0, 0, 0.014, 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508, 0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583), baseBeforegRNA = 4, baseAfterPAM = 3, featureWeightMatrixFile = system.file("extdata", "DoenchNBT2014.csv", package = "CRISPRseek"), useScore = TRUE, useEfficacyFromInputSeq = FALSE, outputUniqueREs = TRUE, foldgRNAs = FALSE, gRNA.backbone="GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUU", temperature = 37, overwrite = FALSE, scoring.method = c("Hsu-Zhang", "CFDscore"), subPAM.activity = hash( AA =0, AC =  0, AG = 0.259259259, AT = 0, CA = 0, CC = 0, CG = 0.107142857, CT = 0, GA = 0.069444444, GC = 0.022222222, GG = 1, GT = 0.016129032, TA = 0, TC = 0, TG = 0.038961039, TT = 0), subPAM.position = c(22, 23), mismatch.activity.file = system.file("extdata", "NatureBiot2016SuppTable19DoenchRoot.csv", package = "CRISPRseek") )

Arguments

inputFilePath
Sequence input file path or a DNAStringSet object that contains sequences to be searched for potential gRNAs
format
Format of the input file, fasta, fastq and bed are supported, default fasta
header
Indicate whether the input file contains header, default FALSE, only applies to bed format
gRNAoutputName
Specify the name of the gRNA outupt file when inputFilePath is DNAStringSet object instead of file path
findgRNAs
Indicate whether to find gRNAs from the sequences in the input file or skip the step of finding gRNAs, default TRUE. Set it to FALSE if the input file contains user selected gRNAs plus PAM already.
exportAllgRNAs
Indicate whether to output all potential gRNAs to a file in fasta format, genbank format or both. Default to both.
findgRNAsWithREcutOnly
Indicate whether to find gRNAs overlap with restriction enzyme recognition pattern
REpatternFile
File path containing restriction enzyme cut patterns
minREpatternSize
Minimum restriction enzyme recognition pattern length required for the enzyme pattern to be searched for, default 4
overlap.gRNA.positions
The required overlap positions of gRNA and restriction enzyme cut site, default 17 and 18
findPairedgRNAOnly
Choose whether to only search for paired gRNAs in such an orientation that the first one is on minus strand called reverse gRNA and the second one is on plus strand called forward gRNA. TRUE or FALSE, default FALSE
annotatePaired
Indicate whether to output paired information, default TRUE
min.gap
Minimum distance between two oppositely oriented gRNAs to be valid paired gRNAs. Default 0
enable.multicore
Indicate whether enable parallel processing, default FALSE. For super long sequences with lots of gRNAs, suggest set it to TRUE
n.cores.max
Indicating maximum number of cores to use in multi core mode, i.e., parallel processing, default 6. Please set it to 1 to disable multicore processing for small dataset.
max.gap
Maximum distance between two oppositely oriented gRNAs to be valid paired gRNAs. Default 20
gRNA.name.prefix
The prefix used when assign name to found gRNAs, default gRNA, short for guided RNA.
PAM.size
PAM length, default 3
gRNA.size
The size of the gRNA, default 20
PAM
PAM sequence after the gRNA, default NGG
BSgenomeName
BSgenome object. Please refer to available.genomes in BSgenome package. For example, BSgenome.Hsapiens.UCSC.hg19 for hg19, BSgenome.Mmusculus.UCSC.mm10 for mm10, BSgenome.Celegans.UCSC.ce6 for ce6, BSgenome.Rnorvegicus.UCSC.rn5 for rn5, BSgenome.Drerio.UCSC.danRer7 for Zv9, and BSgenome.Dmelanogaster.UCSC.dm3 for dm3
chromToSearch
Specify the chromosome to search, default to all, meaning search all chromosomes. For example, chrX indicates searching for matching in chromosome X only
chromToExclude
Specify the chromosome not to search. If specified as "", meaning to search chromosomes specified by chromToSearch. By default, to exclude haplotype blocks from offtarget search in hg19, i.e., chromToExclude = c("chr17_ctg5_hap1","chr4_ctg9_hap1", "chr6_apd_hap1", "chr6_cox_hap2", "chr6_dbb_hap3", "chr6_mann_hap4", "chr6_mcf_hap5","chr6_qbl_hap6", "chr6_ssto_hap7")
max.mismatch
Maximum mismatch allowed in off target search, default 3. Warning: will be considerably slower if set > 3
PAM.pattern
Regular expression of protospacer-adjacent motif (PAM), default N[A|G]G$
allowed.mismatch.PAM
Number of degenerative bases in the PAM sequence, default to 2 for N[A|G]G PAM
gRNA.pattern
Regular expression or IUPAC Extended Genetic Alphabet to represent gRNA pattern, default is no restriction. To specify that the gRNA must start with GG for example, then set it to ^GG. Please see help(translatePattern) for a list of IUPAC Extended Genetic Alphabet.
min.score
minimum score of an off target to included in the final output, default 0
topN
top N off targets to be included in the final output, default 1000
topN.OfftargetTotalScore
top N off target used to calculate the total off target score, default 10
annotateExon
Choose whether or not to indicate whether the off target is inside an exon or not, default TRUE
txdb
TxDb object, for creating and using TxDb object, please refer to GenomicFeatures package. For a list of existing TxDb object, please search for annotation package starting with Txdb at http://www.bioconductor.org/packages/release/BiocViews.html#___AnnotationData, such as TxDb.Rnorvegicus.UCSC.rn5.refGene for rat, TxDb.Mmusculus.UCSC.mm10.knownGene for mouse, TxDb.Hsapiens.UCSC.hg19.knownGene for human, TxDb.Dmelanogaster.UCSC.dm3.ensGene for Drosophila and TxDb.Celegans.UCSC.ce6.ensGene for C.elegans
orgAnn
organism annotation mapping such as org.Hs.egSYMBOL in org.Hs.eg.db package for human
outputDir
the directory where the off target analysis and reports will be written to
fetchSequence
Fetch flank sequence of off target or not, default TRUE
upstream
upstream offset from the off target start, default 200
downstream
downstream offset from the off target end, default 200
upstream.search
upstream offset from the bed input starts to search for gRNAs, default 0
downstream.search
downstream offset from the bed input ends to search for gRNAs, default 0
weights
Applicable only when scoring.method is set to Hsu-Zhang a numeric vector size of gRNA length, default c(0, 0, 0.014, 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508, 0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583) which is used in Hsu et al., 2013 cited in the reference section
baseBeforegRNA
Number of bases before gRNA used for calculating gRNA efficiency, default 4
baseAfterPAM
Number of bases after PAM used for calculating gRNA efficiency, default 3
featureWeightMatrixFile
Feature weight matrix file used for calculating gRNA efficiency. By default DoenchNBT2014 weight matrix is used. To use alternative weight matrix file, please input a csv file with first column containing significant features and the second column containing the corresponding weights for the features. Please see Doench et al., 2014 for details.
useScore
Default TRUE, display in gray scale with the darkness indicating the gRNA efficacy. The taller bar shows the Cas9 cutting site. If set to False, efficacy will not show. Instead, gRNAs in plus strand will be colored red and gRNAs in negative strand will be colored green.
useEfficacyFromInputSeq
Default FALSE. If set to TRUE, summary file will contain gRNA efficacy calculated from input sequences instead of from off-target analysis. Set it to TRUE if the input sequence is from a different species than the one used for off-target analysis.
outputUniqueREs
Default TRUE. If set to TRUE, summary file will contain REs unique to the cleavage site within 100 or 200 bases surrounding the gRNA sequence.
foldgRNAs
Default FALSE. If set to TRUE, summary file will contain minimum free energy of the secondary structure of gRNA with gRNA backbone from GeneRfold package provided that GeneRfold package has been installed.
gRNA.backbone
gRNA backbone constant region sequence. Default to the sequence in Sp gRNA backbone.
temperature
temperature in celsius. Default to 37 celsius.
overwrite
overwrite the existing files in the output directory or not, default FALSE
scoring.method
Indicates which method to use for offtarget cleavage rate estimation, currently two methods are supported, Hsu-Zhang and CFDscore
subPAM.activity
Applicable only when scoring.method is set to CFDscore A hash to represent the cleavage rate for each alternative sub PAM sequence relative to preferred PAM sequence
subPAM.position
Applicable only when scoring.method is set to CFDscore The start and end positions of the sub PAM. Default to 22 and 23 for SP with 20bp gRNA and NGG as preferred PAM
mismatch.activity.file
Applicable only when scoring.method is set to CFDscore A comma separated (csv) file containing the cleavage rates for all possible types of single nucleotide mismatche at each position of the gRNA. By default, using the supplemental Table 19 from Doench et al., Nature Biotechnology 2016

Value

Four tab delimited files are generated in the output directory: OfftargetAnalysis.xls (detailed information of off targets), Summary.xls (summary of the gRNAs), REcutDetails.xls (restriction enzyme cut sites of each gRNA), and pairedgRNAs.xls (potential paired gRNAs)

Details

References

Patrick D Hsu, David A Scott, Joshua A Weinstein, F Ann Ran, Silvana Konermann, Vineeta Agarwala, Yinqing Li, Eli J Fine, Xuebing Wu, Ophir Shalem, Thomas J Cradick, Luciano A Marraffini, Gang Bao & Feng Zhang (2013) DNA targeting specificity of rNA-guided Cas9 nucleases. Nature Biotechnology 31:827-834 Doench JG, Hartenian E, Graham DB, Tothova Z, Hegde M, Smith I, Sullender M, Ebert BL, Xavier RJ, Root DE. Rational design of highly active sgRNAs for CRISPR-Cas9-mediated gene inactivation. Nat Biotechnol. 2014 Sep 3. doi: 10.1038 nbt.3026 Lihua Julie Zhu, Benjamin R. Holmes, Neil Aronin and Michael Brodsky. CRISPRseek: a Bioconductor package to identify target-specific guide RNAs for CRISPR-Cas9 genome-editing systems. Plos One Sept 23rd 2014 Doench JG et al., Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nature Biotechnology Jan 18th 2016

See Also

CRISPRseek

Examples

Run this code
	library(CRISPRseek)
	library("BSgenome.Hsapiens.UCSC.hg19")
	library(TxDb.Hsapiens.UCSC.hg19.knownGene)
	library(org.Hs.eg.db)
	outputDir <- getwd()
	inputFilePath <- system.file("extdata", "inputseq.fa",
            package = "CRISPRseek")
	REpatternFile <- system.file("extdata", "NEBenzymes.fa", 
            package = "CRISPRseek")
	results <- offTargetAnalysis(inputFilePath, findgRNAsWithREcutOnly = TRUE, 
            REpatternFile = REpatternFile, findPairedgRNAOnly = FALSE, 
            annotatePaired = FALSE,
            BSgenomeName = Hsapiens, chromToSearch = "chrX",
            txdb = TxDb.Hsapiens.UCSC.hg19.knownGene, 
	    orgAnn = org.Hs.egSYMBOL, max.mismatch = 1, 
            outputDir = outputDir, overwrite = TRUE)

Run the code above in your browser using DataLab