featureCounts(files,# annotation annot.inbuilt="mm10", annot.ext=NULL, isGTFAnnotationFile=FALSE, GTF.featureType="exon", GTF.attrType="gene_id", chrAliases=NULL, # level of summarization useMetaFeatures=TRUE, # overlap between reads and features allowMultiOverlap=FALSE, minOverlap=1, largestOverlap=FALSE, readExtension5=0, readExtension3=0, read2pos=NULL, # multi-mapping reads countMultiMappingReads=FALSE, fraction=FALSE,# read filtering minMQS=0, splitOnly=FALSE, nonSplitOnly=FALSE, primaryOnly=FALSE, ignoreDup=FALSE, # strandness strandSpecific=0, # exon-exon junctions juncCounts=FALSE, genome=NULL, # parameters specific to paired end reads isPairedEnd=FALSE, requireBothEndsMapped=FALSE, checkFragLength=FALSE, minFragLength=50, maxFragLength=600, countChimericFragments=TRUE, autosort=TRUE, # miscellaneous nthreads=1, maxMOp=10, reportReads=FALSE)
hg19, corresponding to the NCBI RefSeq annotations for genomes `mm10', `mm9', `hg38' and `hg19', respectively.
mm10by default. The in-built annotation has a SAF format (see below).
annot.inbuiltif they are both provided.
annot.extargument is in GTF format or not.
FALSEby default. This option is only applicable when
exonby default. This argument is only applicable when
gene_idby default. This argument is only applicable when
TRUE, features in the annotation (each row is a feature) will be grouped into meta-features using their values in the ``GeneID" column in the SAF-format annotation file or using the ``gene_id" attribute in the GTF-format annotation file, and reads will assiged to the meta-features instead of the features. See below for more details.
integergiving the minimum number of overlapped bases required for assigning a read to a feature (or a meta-feature). For assignment of read pairs (fragments), numbers of overlapping bases from each read in the same pair will be summed. If a negative value is provided, the read will be extended from both ends.
TRUE, a read (or read pair) will be assigned to the feature (or meta-feature) that has the largest number of overlapping bases, if the read (or read pair) overlaps with multiple features (or meta-features).
integergiving the number of bases extended upstream from 5' end of each read.
integergiving the number of bases extended downstream from 3' end of each read.
5(denoting 5' most base) and
3(denoting 3' most base). The default value is
NULL. With the default value, the whole read is used for summarization. When
read2posis set to
3), read summarization will be performed based on the 5' (or 3') most base position.
read2poscan be used together with
readExtension3parameters to set any desired length for reads.
FALSEby default. If
TRUE, a multi-mapping read will be counted up to N times if it has N reported mapping locations. This function uses the `NH' tag to find multi-mapping reads.
TRUE, a fractional count, 1/n, will be generated for each reported alignment of a multi-mapping read, where n is the total number of alignments reported for that read.
countMultiMappingReadsmust be set to
integergiving the minimum mapping quality score a read must satisfy in order to be counted. For paired-end reads, at least one end should satisfy this criteria.
FALSEby default. Example split alignments are exon-spanning reads from RNA-seq data.
useMetaFeaturesshould be set to
allowMultiOverlapshould be set to
TRUE, if the purpose of summarization is to assign exon-spanning reads to all their overlapping exons.
TRUE, all primary alignments in a dataset will be counted no matter they are from multi-mapping reads or not (ie.
FALSEby default. Read duplicates are identified using bit Ox400 in the FLAG field in SAM/BAM files. The whole fragment (read pair) will be ignored if paired end.
integerindicating if strand-specific read counting should be performed. It has three possible values:
TRUE, fragments (templates or read pairs) will be counted instead of individual reads.
TRUE. The fragment length criteria are specified via
integergiving the minimum fragment length for paired-end reads.
integergiving the maximum fragment length for paired-end reads.
maxFragLengthare only applicable when
TRUE. Note that when a fragment spans two or more exons, the observed fragment length might be much bigger than the nominal fragment length.
TRUE, reads will be automatically sorted by their names if reads from the same pair are found not to be located next to each other in the input. No read sorting will be performed if there are no such reads found.
integergiving the number of threads used for running this function.
integergiving the maximum number of `M' operations (matches or mis-matches) allowed in a CIGAR string.
10by default. Both `X' and `=' operations are treated as `M' and adjacent `M' operations are merged in the CIGAR string.
TRUE, read counting results for reads/fragments will be saved to a tab-delimited file that contains four columns including name of read/fragment, status(assigned or the reason if not assigned), name of target feature/meta-feature and number of hits if the read/fragment is counted multiple times. Name of the file is the same as name of the input read file except a suffix `.featureCounts' is added. Multiple files will be generated if there is more than one input read file.
juncCountsis set to
Length. When read summarization was performed at feature level, each row in the data frame is a feature and columns in the data frame give the annotation information for the features. When read summarization was performed at meta-feature level, each row in the data frame is a meta-feature and columns in the data frame give the annotation information for the features included in each meta feature except the
Lengthcolumn. For each meta-feature, the
Lengthcolumn gives the total length of genomic regions covered by features included in that meta-feature. Note that this length will be less than the sum of lengths of features included in the meta-feature when there are features overlapping with each other. Also note the
GeneIDcolumn gives Entrez gene identifiers when the in-built annotations are used.
featureCountsis a general-purpose read summarization function, which assigns to the genomic features (or meta-features) the mapped reads that were generated from genomic DNA and RNA sequencing.
This function takes as input a set of files containing read mapping results output from a read aligner (e.g.
align), and then assigns mapped reads to genomic features.
Both SAM and BAM format input files are accepted.
useMetaFeatures specifies the read summarization should be performed at the feature level or at the meta-feature level.
Each entry in the annotation data is a feature, which for example could be an exon.
featureCounts function creates meta-features by grouping features using the gene identifiers included in the ``GeneID" column in the annotation data (or in the ``gene_id" attribute in the GTF format annotation file) and then assigns reads to meta-features instead of features.
useMetaFeatures is particularly useful for gene-level expression analysis, because it instructs this function to count reads for genes (meta-features) instead of exons (features).
Note that when meta-features are used in the read summarization, if a read is found to overlap two or more features belong to the same meta-feature it will be only counted once for that meta-feature.
allowMultiOverlap specifies how those reads, which are found to overlap with more than one feature (or meta-feature), should be assigned.
FALSE, a read overlapping multiple features (or meta-features) will not be assigned to any of them (not counted).
Otherwise, it will be assigned to all of them.
A read overlaps a meta-feature if it overlaps at least one of the features belonging to this meta-feature.
exon are typically used when summarizing RNA-seq read data, which will yield read counts for genes and exons, respectively.
The in-built annotations for human and mouse genomes (
mm9) provided in this function can be conveniently used for read summarization.
These annotations were downloaded from the NCBI ftp server (ftp://ftp.ncbi.nlm.nih.gov/genomes/) and were then postprocessed by removing redundant chromosomal regions within each gene and combining adjacent CDS and UTR sequences.
The in-built annotations use the SAF annotation format (see below) and their content can be retrieved using the
Users may also choose to provide their own annotation for summarization. If users provide a SAF (Simplified Annotation Format) annotation, the annotation should have the following format:
GeneID Chr Start End Strand 497097 chr1 3204563 3207049 - 497097 chr1 3411783 3411982 - 497097 chr1 3660633 3661579 - 100503874 chr1 3637390 3640590 - 100503874 chr1 3648928 3648985 - 100038431 chr1 3670236 3671869 - ...
The SAF annotation format has five required columns, including
These columns can be in any order.
More columns can be included in the annotation.
Columns are tab-delimited.
Column names are case insensitive.
GeneID column may contain integers or character strings.
Chromosomal names included in the
Chr column must match those used inclued in the mapping results, otherwise reads will fail to be assigned.
Users may provide a SAF annotation in the form of a data frame or a file using the
Users may also provide a GTF/GFF format annotation via
But GTF/GFF annotation should only be provided as a file, and
isGTFAnnotationFile should be set to
TRUE when such a annotation is provided.
featureCounts function uses the `gene_id' attribute in a GTF/GFF annotation to group features to form meta-features when performing read summarization at meta-feature level.
TRUE, fragments (pairs of reads) instead of reads will be counted.
featureCounts function checks if reads from the same pair are adjacent to each other (this could happen when reads were for example sorted by their mapping locations), and it automatically reorders those reads that belong to the same pair but are not adjacent to each other in the input read file.