align(index,readfile1,readfile2=NULL,type="rna",input_format="gzFASTQ",output_format="BAM",
output_file=paste(readfile1,"subread",output_format,sep="."),nsubreads=10,
TH1=3,TH2=1,maxMismatches=3,nthreads=1,indels=5,complexIndels=FALSE,phredOffset=33,
unique=TRUE,nBestLocations=1,minFragLength=50,maxFragLength=600,PE_orientation="fr",
nTrim5=0,nTrim3=0,readGroupID=NULL,readGroup=NULL,color2base=FALSE,
DP_GapOpenPenalty=-1,DP_GapExtPenalty=0,DP_MismatchPenalty=0,DP_MatchScore=2,
detectSV=FALSE)
subjunc(index,readfile1,readfile2=NULL,input_format="gzFASTQ",output_format="BAM",
output_file=paste(readfile1,"subjunc",output_format,sep="."),nsubreads=14,
TH1=1,TH2=1,maxMismatches=3,nthreads=1,indels=5,complexIndels=FALSE,phredOffset=33,
unique=TRUE,nBestLocations=1,minFragLength=50,maxFragLength=600,PE_orientation="fr",
nTrim5=0,nTrim3=0,readGroupID=NULL,readGroup=NULL,color2base=FALSE,
reportAllJunctions=FALSE)input_format option for more supported formats.readfile2 should be in the same order as their mate files included in readfile1 .NULL by default.rna (or 0; RNA-seq data) and dna (or 1; genomic DNA-seq data such as WGS, WES, ChIP-seq data etc.). Character strings are case insensitive.gzFASTQ by default (this also includes FASTQ, FASTA and gzipped FASTA formats). Other supported formats include SAM and BAM. Character values are case insensitive.BAM by default. Acceptable formats include SAM and BAM.readfile1 added with an suffix string.3 by default. Mis-matches found in soft-clipped bases are not counted.1 by default.5 by default.TRUE, the program will try to detect multiple short indels that occurs concurrently in a small genomic region (indels could be as close as 1bp apart).33 and 64. By default, 33 is used.TRUE by default. Uniquely mapped reads must have one (1) mapping location that has less mis-matched bases than other candidate locations.unique argument takes precedence over nBestLocations argument.fr, ff and rf. Letter f denotes the forward strand and letter r the reverse strand. fr by default (ie. the first read in the pair is on the forward strand and the second read on the reverse strand).0 by default.0 by default.NULL by default.tag:value. This string will be added to the read group (RG) header in the mapping output. NULL by default.TRUE, color-space read bases will be converted to base-space bases in the mapping output. Note that the mapping itself will still be performed at color-space. FALSE by default.-1 by default.0 by default.0 by default.2 by default.Subjunc, BED files including discovered exon-exon junctions are also written to the current working directory.
align function implements the Subread aligner (Liao et al., 2013) that uses a new mapping paradigm called ``seed-and-vote".
Subread is general-purpose aligner that can be used to align both genomic DNA-seq reads and RNA-seq reads.Subjunc is designed for mapping RNA-seq reads.
The major difference between Subjunc and Subread is that Subjunc reports discovered exon-exon junctions and it also performs full alignments for every read including exon-spanning reads.
Subread uses the largest mappable regions in the reads to find their mapping locations.
The seed-and-vote paradigm has been found to be not only more accurate than the conventional seed-and-extend (adopted by most aligners) in read mapping, but it is a lot more efficient (Liao et al., 2013).
Both Subread and Subjunc can be used to align reads generated from major sequencing platforms including Illumina GA/HiSeq, ABI SOLiD, Roche 454 and Ion Torrent sequencers.
Note that to map color-space reads (e.g. SOLiD reads), a color-space index should be built for the reference genome (see buildindex for details).
Subread and Subjunc have adjustable memory usage.
See buildindex function for more details.
Mapping performance is largely determined by the number of subreads extracted from each read nsubreads and the consensus threshold TH1 (also TH2 for paired-end read data).
Default settings are recommended for most of the read mapping tasks.
Subjunc requires donor/receptor sites to be present when detecting exon-exon junctions.
It can detect up to four junction locations in each exon-spanning read.
detectSV option should be used for SV detection in genomic DNA sequencing data.
For RNA-seq data, users may use subjunc with the reportAllJunctions option to detect SVs (and also junctions).
For each library, breakpoints detected from SV events will be saved to a file with suffix name `.breakpoints.txt', which includes chromosomal coordinates of SV breakpoints and numbers of supporting reads.
The BAM/SAM output includes extra fields to describe the complete alignments of breakpoint-containing reads.
For a breakpoint-containing read, mapping of its major sequence segment is described in the main fields of BAM/SAM ouptut whereas mapping of its minor sequence segment, which does not map along with the major segment due to the presence of a breakpoint, is described in the extra fields including `CC'(Chr), `CP'(Position),`CG'(CIGAR) and `CT'(strand).
Note that each breakpoint-containing read occupies only one row in BAM/SAM output.
# Build an index for the artificial sequence included in file 'reference.fa'.
library(Rsubread)
ref <- system.file("extdata","reference.fa",package="Rsubread")
buildindex(basename="./reference_index",reference=ref)
# align a sample read dataset ('reads.txt') to the sample reference
reads <- system.file("extdata","reads.txt.gz",package="Rsubread")
align(index="./reference_index",readfile1=reads,output_file="./Rsubread_alignment.BAM",phredOffset=64)
Run the code above in your browser using DataLab