processAmplicons(readfile, readfile2=NULL, barcodefile, hairpinfile, barcodeStart=1, barcodeEnd=5, barcodeStartRev=NULL, barcodeEndRev=NULL, hairpinStart=37, hairpinEnd=57, allowShifting=FALSE, shiftingBase=3, allowMismatch=FALSE, barcodeMismatchBase=1, hairpinMismatchBase=2, allowShiftedMismatch=FALSE, verbose=FALSE)
hairpinStart
and hairpinEnd
should the program check for a hairpin/sgRNA match when allowShifting
is TRUE
allowShifting
is TRUE
allowShifting
is TRUE
allowShifting
and allowMismatch
are both TRUE
. It indicates whether we check for sequence mismatches at a shifted position.TRUE
, output program progressDGEList
object with following components:
readfile2
, barcodeStartRev
and barcodeEndRev
are specified, a third column 'SequencesReverse' is expected in the barcode file. The barcode file may also contain a 'group' column that indicates which experimental group a sample belongs to. Additional columns in each file will be included in the respective $samples
or $genes
data.frames of the final codeDGEList object. These files, along with the fastq file/(s) are assumed to be in the current working directory.To compute the count matrix, matching to the given barcodes and hairpins/sgRNAs is conducted in two rounds. The first round looks for an exact sequence match for the given barcode sequences and hairpin/sgRNA sequences at the locations specified. If allowShifting
is set to TRUE
, the program also checks if a given hairpin/sgRNA sequence can be found at a neighbouring position in the read. For hairpins/sgRNAs without a match, the program performs a second round of matching which allows for sequence mismatches. The program checks parameter allowShifting
to see if matches can be found at shifted positions in the read and allowShiftedMismatch
accommodates mismatches at the shifted positions. The maximum number of mismatch bases in barcode and hairpin/sgRNA are specified by the parameters barcodeMismatchBase
and hairpinMismatchBase
.
The program outputs a DGEList
object, with a count matrix indicating the number of times each barcode and hairpin/sgRNA combination could be matched in reads from input fastq file/(s).
For further examples and data, refer to the Case studies available from http://bioinf.wehi.edu.au/shRNAseq/.