DEGseq: DEGseq: Identify Differentially Expressed Genes from RNA-seq data

Description

This function is used to identify differentially expressed genes from RNA-seq data. It takes uniquely mapped reads from RNA-seq data for the two samples with a gene annotation as input. So users should map the reads (obtained from sequencing libraries of the samples) to the corresponding genome in advance.

Usage

DEGseq(mapResultBatch1, mapResultBatch2, fileFormat="bed", readLength=32,
       strandInfo=FALSE, refFlat, groupLabel1="group1", groupLabel2="group2",
       method=c("LRT", "CTR", "FET", "MARS", "MATR", "FC"), 
       pValue=1e-3, zScore=4, qValue=1e-3, foldChange=4, thresholdKind=1,
       outputDir="none", normalMethod=c("none", "loess", "median"),
       depthKind=1, replicate1="none", replicate2="none",
       replicateLabel1="replicate1", replicateLabel2="replicate2")

Arguments

mapResultBatch1

vector containing uniquely mapping result files for technical replicates of sample1 (or replicate1 when method="CTR").

mapResultBatch2

vector containing uniquely mapping result files for technical replicates of sample2 (or replicate2 when method="CTR").

fileFormat

file format: "bed" or "eland". example of "bed" format: chr12 7 38 readID 2 + example of "eland" format: readID chr12.fa 7 U2 F Note: The field separator character is TAB. And the files must follow the format as one of the examples.

readLength

the length of the reads (only used if fileFormat="eland").

strandInfo

whether the strand information was retained during the cloning of the cDNAs.

"TRUE": retained,
"FALSE": not retained.

refFlat

gene annotation file in UCSC refFlat format. See http://genome.ucsc.edu/goldenPath/gbdDescriptionsOld.html#RefFlat.

groupLabel1

label of group1 on the plots.

groupLabel2

label of group2 on the plots.

method

method to identify differentially expressed genes. Possible methods are:

"LRT": Likelihood Ratio Test (Marioni et al. 2008),
"CTR": Check whether the variation between two Technical Replicates can be explained by the random sampling model (Wang et al. 2009),
"FET": Fisher's Exact Test (Joshua et al. 2009),
"MARS": MA-plot-based method with Random Sampling model (Wang et al. 2009),
"MATR": MA-plot-based method with Technical Replicates (Wang et al. 2009),
"FC": Fold-Change threshold on MA-plot.

pValue

pValue threshold (for the methods: LRT, FET, MARS, MATR). only used when thresholdKind=1.

zScore

zScore threshold (for the methods: MARS, MATR). only used when thresholdKind=2.

qValue

qValue threshold (for the methods: LRT, FET, MARS, MATR). only used when thresholdKind=3 or thresholdKind=4.

thresholdKind

the kind of threshold. Possible kinds are:

1: pValue threshold,
2: zScore threshold,
3: qValue threshold (Benjamini et al. 1995),
4: qValue threshold (Storey et al. 2003),
5: qValue threshold (Storey et al. 2003) and Fold-Change threshold on MA-plot are both required (can be used only whenmethod="MARS").

foldChange

fold change threshold on MA-plot (for the method: FC).

outputDir

the output directory.

normalMethod

the normalization method: "none", "loess", "median" (Yang,Y.H. et al. 2002). recommend: "none".

depthKind

1: take the total number of reads uniquely mapped to genome as the depth for each replicate, 0: take the total number of reads uniquely mapped to all annotated genes as the depth for each replicate. We recommend taking depthKind=1, especially when the genes in annotation file are part of all genes.

replicate1

files containing uniquely mapped reads obtained from replicate batch1 (only used when method="MATR").

replicate2

files containing uniquely mapped reads obtained from replicate batch2 (only used when method="MATR").

replicateLabel1

label of replicate batch1 on the plots (only used when method="MATR").

replicateLabel2

label of replicate batch2 on the plots (only used when method="MATR").

References

Benjamini,Y. and Hochberg,Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57, 289-300.

Jiang,H. and Wong,W.H. (2009) Statistical inferences for isoform expression in RNA-seq. Bioinformatics, 25, 1026-1032.

Bloom,J.S. et al. (2009) Measuring differential gene expression by short read sequencing: quantitative comparison to 2-channel gene expression microarrays. BMC Genomics, 10, 221.

Marioni,J.C. et al. (2008) RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res., 18, 1509-1517.

Storey,J.D. and Tibshirani,R. (2003) Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. 100, 9440-9445. Wang,L.K. and et al. (2010) DEGseq: an R package for identifying differentially expressed genes from RNA-seq data, Bioinformatics 26, 136 - 138. Yang,Y.H. et al. (2002) Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Research, 30, e15.

Examples

Run this code

kidneyR1L1 <- system.file("extdata", "kidneyChr21.bed.txt", package="DEGseq")
  liverR1L2  <- system.file("extdata", "liverChr21.bed.txt", package="DEGseq")
  refFlat    <- system.file("extdata", "refFlatChr21.txt", package="DEGseq")
  mapResultBatch1 <- c(kidneyR1L1)  ## only use the data from kidneyR1L1 and liverR1L2
  mapResultBatch2 <- c(liverR1L2)
  outputDir <- file.path(tempdir(), "DEGseqExample")
  DEGseq(mapResultBatch1, mapResultBatch2, fileFormat="bed", refFlat=refFlat,
         outputDir=outputDir, method="LRT")
  cat("outputDir:", outputDir, "")

Run the code above in your browser using DataLab

Description

Usage

Arguments

References

See Also

Examples