Learn R Programming

CRISPRseek (version 1.8.1)

compare2Sequences: Compare 2 input sequences/sequence sets for possible guide RNAs (gRNAs)

Description

Generate all possible guide RNAs (gRNAs) for two input sequences, or two sets of sequences and generate scores for potential off-targets in the other sequence.

Usage

compare2Sequences(inputFile1Path, inputFile2Path, inputNames=c("Seq1", "Seq2"), format = "fasta", findgRNAsWithREcutOnly = FALSE, searchDirection=c("both","1to2", "2to1"), REpatternFile=system.file("extdata", "NEBenzymes.fa", package = "CRISPRseek"), minREpatternSize = 6, overlap.gRNA.positions = c(17, 18), findPairedgRNAOnly = FALSE, min.gap = 0, max.gap = 20, gRNA.name.prefix = "gRNA", PAM.size = 3, gRNA.size = 20, PAM = "NGG", PAM.pattern = "N[A|G]G$", max.mismatch = 3, outputDir, weights = c(0, 0, 0.014, 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508, 0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583), overwrite = FALSE, baseBeforegRNA = 4, baseAfterPAM = 3, featureWeightMatrixFile = system.file("extdata", "DoenchNBT2014.csv", package = "CRISPRseek"), foldgRNAs = TRUE, gRNA.backbone="GUUUUAGAGCUAGAAAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUU", temperature = 37)

Arguments

inputFile1Path
Sequence input file 1 path that contains one of the two sequences to be searched for potential gRNAs
inputFile2Path
Sequence input file 2 path that contains one of the two sequences to be searched for potential gRNAs
inputNames
Name of the input sequences when inputFile1Path and inputFile2Path are DNAStringSet instead of file path
format
Format of the input file, fasta and fastq are supported, default fasta
findgRNAsWithREcutOnly
Indicate whether to find gRNAs overlap with restriction enzyme recognition pattern
searchDirection
Indicate whether perfrom gRNA in both sequences and off-target search against each other (both) or search gRNA in input1 and off-target analysis in input2 (1to2), or vice versa (2to1)
REpatternFile
File path containing restriction enzyme cut patters
minREpatternSize
Minimum restriction enzyme recognition pattern length required for the enzyme pattern to be searched for, default 6
overlap.gRNA.positions
The required overlap positions of gRNA and restriction enzyme cut site, default 17 and 18
findPairedgRNAOnly
Choose whether to only search for paired gRNAs in such an orientation that the first one is on minus strand called reverse gRNA and the second one is on plus strand called forward gRNA. TRUE or FALSE, default FALSE
min.gap
Minimum distance between two oppositely oriented gRNAs to be valid paired gRNAs. Default 0
max.gap
Maximum distance between two oppositely oriented gRNAs to be valid paired gRNAs. Default 20
gRNA.name.prefix
The prefix used when assign name to found gRNAs, default gRNA, short for guided RNA.
PAM.size
PAM length, default 3
gRNA.size
The size of the gRNA, default 20
PAM
PAM sequence after the gRNA, default NGG
PAM.pattern
Regular expression of PAM, default N[A|G]G$
max.mismatch
Maximum mismatch allowed to search the off targets in the other sequence, default 3
outputDir
the directory where the sequence comparison results will be written to
weights
numeric vector size of gRNA length, default c(0, 0, 0.014, 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508, 0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583) which is used in Hsu et al., 2013 cited in the reference section
overwrite
overwrite the existing files in the output directory or not, default TRUE
baseBeforegRNA
Number of bases before gRNA used for calculating gRNA efficiency, default 4
baseAfterPAM
Number of bases after PAM used for calculating gRNA efficiency, default 3
featureWeightMatrixFile
Feature weight matrix file used for calculating gRNA efficiency. By default DoenchNBT2014 weight matrix is used. To use alternative weight matrix file, please input a csv file with first column containing significant features and the second column containing the corresponding weights for the features. Please see Doench et al., 2014 for details.
foldgRNAs
Default TRUE. If set to TRUE, summary file will contain minimum free energy of the secondary structure of gRNA with gRNA backbone from GeneRfold package provided that GeneRfold package has been installed.
gRNA.backbone
gRNA backbone constant region sequence. Default to the sequence in Sp gRNA backbone.
temperature
temperature in celsius. Default to 37 celsius.

Value

Return a data frame with all potential gRNAs from both sequences. In addition, a tab delimited file scoresFor2InputSequences.xls is also saved in the outputDir, sorted by scoreDiff descending.
name
name of the gRNA
gRNAPlusPAM
gRNA plus PAM sequence
targetInSeq1
target/off-target sequence including PAM in the 1st input sequence file
targetInSeq2
target/off-target sequence incuding PAM in the 2nd input sequence file
guideAlignment2Offtarget
alignment of gRNA to the other input sequence (off-target sequence)
offTargetStrand
strand of the other sequence (off-target sequence) the gRNA align to
scoreForSeq1
score for the target sequence in the 1st input sequence file
scoreForSeq2
score for the target sequence in the 1st input sequence file
mismatch.distance2PAM
distances of mismatch to PAM, e.g., 14 means the mismatch is 14 bp away from PAM
n.mismatch
number of mismatches between the off-target and the gRNA
targetSeqName
the name of the input sequence where the target sequence is located
scoreDiff
scoreForSeq1 - scoreForSeq2
bracket.notation
folded gRNA in bracket notation
mfe.sgRNA
minimum free energy of sgRNA
mfe.diff
mfe.sgRNA-mfe.backbone
mfe.backbone
minimum free energy of the gRNA backbone by itself

Details

References

Patrick D Hsu, David A Scott, Joshua A Weinstein, F Ann Ran, Silvana Konermann, Vineeta Agarwala, Yinqing Li, Eli J Fine, Xuebing Wu, Ophir Shalem, Thomas J Cradick, Luciano A Marraffini, Gang Bao & Feng Zhang (2013) DNA targeting specificity of rNA-guided Cas9 nucleases. Nature Biotechnology 31:827-834

See Also

CRISPRseek

Examples

Run this code
    library(CRISPRseek)
    inputFile1Path <- system.file("extdata", "rs362331T.fa",
            package = "CRISPRseek")
    inputFile2Path <- system.file("extdata", "rs362331C.fa",
            package = "CRISPRseek")
    REpatternFile <- system.file("extdata", "NEBenzymes.fa", 
            package = "CRISPRseek")
    seqs <- compare2Sequences(inputFile1Path, inputFile2Path,
        outputDir = getwd(), 
        REpatternFile = REpatternFile, overwrite = TRUE)

Run the code above in your browser using DataLab