align(index,readfile1,readfile2=NULL,input_format="gzFASTQ",output_format="BAM",
output_file=paste(readfile1,"subread",output_format,sep="."),nsubreads=10,
TH1=3,TH2=1,maxMismatches=3,nthreads=1,indels=5,phredOffset=33,tieBreakQS=FALSE,
tieBreakHamming=TRUE,unique=TRUE,nBestLocations=1,minFragLength=50,
maxFragLength=600,PE_orientation="fr",nTrim5=0,nTrim3=0,readGroupID=NULL,
readGroup=NULL,color2base=FALSE,DP_GapOpenPenalty=-1,DP_GapExtPenalty=0,
DP_MismatchPenalty=0,DP_MatchScore=2,reportFusions=FALSE)
subjunc(index,readfile1,readfile2=NULL,input_format="gzFASTQ",output_format="BAM",
output_file=paste(readfile1,"subjunc",output_format,sep="."),nsubreads=14,
TH1=1,TH2=1,maxMismatches=3,nthreads=1,indels=5,phredOffset=33,tieBreakQS=FALSE,
tieBreakHamming=TRUE,unique=TRUE,nBestLocations=1,minFragLength=50,
maxFragLength=600,PE_orientation="fr",nTrim5=0,nTrim3=0,readGroupID=NULL,
readGroup=NULL,color2base=FALSE,DNAseq=FALSE,reportAllJunctions=FALSE)input_format option for more supported formats.readfile2 should be in the same order as their mate files included in readfile1 .NULL by default.gzFASTQ by default (this includes FASTA format as well). Acceptable formats include FASTQ (including FASTA), gzFASTQ (gzipped FASTQ or FASTA), SAM and BAM. The character string is case insensitive.BAM by default. Acceptable formats include SAM and BAM.readfile1 added with an suffix string.3 by default. Mis-matches found in soft-clipped bases are not counted.1 by default.5 by default.33 and 64. By default, 33 is used.FALSE by default. Note that the mapping quality score used for tie breaking was calcuated only from the perfectly matched subreads (16mers) extracted from the read, whereas the mapping quality scores included in the mapping output for mapped reads were calcuated from using all the bases in the read. Also note that there may still be more than one best mapping location found after tie breaking using this option.TRUE by default. The distance between the mapped read and the target region in the reference is calcuated using all the bases included in the read. Note that there may still be more than one best mapping location found after tie breaking using this option.TRUE by default. It is recommended that this option is used together with tieBreakHamming or tieBreakQS.unique argument takes precedence over nBestLocations argument.fr, ff and rf. Letter f denotes the forward strand and letter r the reverse strand. fr by default (ie. the first read in the pair is on the forward strand and the second read on the reverse strand).0 by default.0 by default.NULL by default.tag:value. This string will be added to the read group (RG) header in the mapping output. NULL by default.TRUE, color-space read bases will be converted to base-space bases in the mapping output. Note that the mapping itself will still be performed at color-space. FALSE by default.-1 by default.0 by default.0 by default.2 by default.TRUE, align function will detect and report such events. This option should only be applied for genomic DNA sequencing data. For RNA-seq data, users should use subjunc with the reportAllJunctions option for detection of fusions. Discovered fusions will be saved to a file (*.fusions.txt). Detailed mapping results for fusion reads will also be saved to the SAM/BAM output file. Secondary alignments of fusion reads will be saved to the following optional fields: CC(Chr), CP(Position), CG(CIGAR) and CT(strand). Note that each fusion read occupies only one row in the SAM/BAM output file.TRUE, subjunc function will ignore the splicing signals (donor/receptor sites) when searching for junctions. Junctions that occur within the same chromosome or across different chromosomes (chimerism) will all be reported for DNA seqence data.TRUE, subjunc function will report those junctions that do not have the required donor/receptor sites (GT/AG), or cross different chromosomes or are located on different strands within the same chromosome, in addition to the canonical exon-exon junctions.Subjunc, BED files including discovered exon-exon junctions are also written to the current working directory.
align function implements the Subread aligner (Liao et al., 2013) that uses a new mapping paradigm called ``seed-and-vote".
Subread is general-purpose aligner that can be used to align both genomic DNA-seq reads and RNA-seq reads.Subjunc is designed for mapping RNA-seq reads.
The major difference between Subjunc and Subread is that Subjunc reports discovered exon-exon junctions and it also performs full alignments for every read including exon-spanning reads.
Subread uses the largest mappable regions in the reads to find their mapping locations.
The seed-and-vote paradigm has been found to be not only more accurate than the conventional seed-and-extend (adopted by most aligners) in read mapping, but it is a lot more efficient (Liao et al., 2013).
Both Subread and Subjunc can be used to align reads generated from major sequencing platforms including Illumina GA/HiSeq, ABI SOLiD, Roche 454 and Ion Torrent sequencers.
Note that to map color-space reads (e.g. SOLiD reads), a color-space index should be built for the reference genome (see buildindex for details).
Subread and Subjunc have adjustable memory usage.
See buildindex function for more details.
Mapping performance is largely determined by the number of subreads extracted from each read nsubreads and the consensus threshold TH1 (also TH2 for paired-end read data).
Default settings are recommended for most of the read mapping tasks.
Subjunc requires donor/receptor sites to be present when detecting exon-exon junctions.
It can detect up to four junction locations in each exon-spanning read.
# Build an index for the artificial sequence included in file 'reference.fa'.
library(Rsubread)
ref <- system.file("extdata","reference.fa",package="Rsubread")
buildindex(basename="./reference_index",reference=ref)
# align a sample read dataset ('reads.txt') to the sample reference
reads <- system.file("extdata","reads.txt.gz",package="Rsubread")
align(index="./reference_index",readfile1=reads,output_file="./Rsubread_alignment.BAM")
Run the code above in your browser using DataLab