processAmplicons(readfile, readfile2=NULL, barcodefile, hairpinfile, barcodeStart=1, barcodeEnd=5, barcode2Start=NULL, barcode2End=NULL, barcodeStartRev=NULL, barcodeEndRev=NULL, hairpinStart=37, hairpinEnd=57, allowShifting=FALSE, shiftingBase=3, allowMismatch=FALSE, barcodeMismatchBase=1, hairpinMismatchBase=2, allowShiftedMismatch=FALSE, verbose=FALSE)
hairpinStart
and hairpinEnd
should the program check for a hairpin/sgRNA match when allowShifting
is TRUE
allowShifting
is TRUE
allowShifting
is TRUE
allowShifting
and allowMismatch
are both TRUE
. It indicates whether we check for sequence mismatches at a shifted position.TRUE
, output program progressDGEList
object with following components:
barcode2Start
and barcode2End
are specified, a third column 'Sequences2' is expected in the barcode file. If readfile2
, barcodeStartRev
and barcodeEndRev
are specified, another column 'SequencesReverse' is expected in the barcode file. The barcode file may also contain a 'group' column that indicates which experimental group a sample belongs to. Additional columns in each file will be included in the respective $samples
or $genes
data.frames of the final codeDGEList object. These files, along with the fastq file/(s) are assumed to be in the current working directory.To compute the count matrix, matching to the given barcodes and hairpins/sgRNAs is conducted in two rounds. The first round looks for an exact sequence match for the given barcode sequences and hairpin/sgRNA sequences at the locations specified. If allowShifting
is set to TRUE
, the program also checks if a given hairpin/sgRNA sequence can be found at a neighbouring position in the read. If a match isn't found, the program performs a second round of matching which allows for sequence mismatches if allowMismatch
is set to TRUE
. The program also checks parameter allowShiftedMismatch
which accommodates mismatches at the shifted positions. The maximum number of mismatch bases in barcode and hairpin/sgRNA are specified by the parameters barcodeMismatchBase
and hairpinMismatchBase
.
The program outputs a DGEList
object, with a count matrix indicating the number of times each barcode and hairpin/sgRNA combination could be matched in reads from input fastq file/(s).
For further examples and data, refer to the Case studies available from http://bioinf.wehi.edu.au/shRNAseq/.