stat.DESeq: Analysis: DESeq2 Analysis of pooled CRISPR NGS data

Description

For the DESeq2 analysis implementation, the read counts of all sgRNAs for a given gene are first summed up to increase the available read count. Then, DESeq2 analysis is perfomed, which includes the estimation of size-factors, the variance stabilization using a parametric fit and a Wald-Test for differnece in log2 fold changes between the untreated and treated data. More information about this can be found in _Love et al._ [Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2](http://www.ncbi.nlm.nih.gov/pubmed/25516281) _Genome Biology_ 2014

Usage

stat.DESeq(untreated.list,treated.list,namecolumn=1, fullmatchcolumn=2,
agg.function=sum, extractpattern=expression("^(.+?)_.+"), sorting=FALSE,
sgRNA.pval = 0.01, filename.deseq="data", fitType="parametric", p.adjust="holm")

Arguments

untreated.list

A list of data.frames of untreated, control samples. e.g. list(df.control1, df.control2)

treated.list

A list of data.frames of treated samples. e.g. list(df.treated1, df.treated2)

namecolumn

In which the target names are located, e.g. namecolumn=1 for the first columns.

fullmatchcolumn

Column, in which readcounts are located, e.g. fullmatchcolumn=2 for the second column.

agg.function

Function used to aggregate gene data from individual sgRNA data. By default, agg.function=mean, but it can be any other function e.g. sum or median.

extractpattern

Regular Expression, used to extract the gene name from the sgRNA name. Please make sure that the gene name extracted is accesible by putting its regular expression in brackets (). The default value expression("^(.+?)_.+") will look for the gene name (.+?)

sorting

Defines whether the final output is sorted by the calculated p-value. By default, sorting=FALSE will return a table sorted by gene name.

sgRNA.pval

p-value threshold to count significant sgRNAs for each gene. *Default* 0.001 *Value* (numeric)

filename.deseq

Filename of raw DESeq2 data output. *Default* "data" *Values* (character)

fitType

See `?DESeq2`. *Default* "parametric" *Values* "parametric", "local" "mean"

p.adjust

Method to adjust p-value for multiple testing. See `?DEseq2`. *Default* "holm" *Values* see `?DESeq2`

Value

stat.DESeq returns a formal class that contains gene names including the calculated p-value. The returned class can be visualized using carpools.hitident (see ?carpools.hitident). The output is formatted as follows:
log2 fold change (MAP): condition untreated vs treated Wald test p-value: condition untreated vs treated DataFrame with 813 rows and 6 columns
lcccccc{ baseMean log2FoldChange lfcSE stat pvalue padj AAK1 73.90565 -0.23319491 0.2927459 -0.7965779 0.42569619 0.7018234 AATK 159.43350 -0.11312924 0.2740927 -0.4127408 0.67979655 0.8514905 ABI1 131.03013 -0.09915855 0.2693971 -0.3680758 0.71281670 0.8691949 ABL1 77.51711 0.07837768 0.3155477 0.2483862 0.80383562 0.9114121 ABL2 119.22621 -0.49412039 0.2846396 -1.7359507 0.08257254 0.3128525 ... ... ... ... ... ... ... }

Details

none

Examples

Run this code

data(caRpools)
data.deseq = stat.DESeq(untreated.list = list(CONTROL1, CONTROL2),
  treated.list = list(TREAT1,TREAT2), namecolumn=1,
  fullmatchcolumn=2, extractpattern=expression("^(.+?)(_.+)"),
  sorting=FALSE, filename.deseq = "ANALYSIS-DESeq2-sgRNA.tab",
  fitType="parametric")
  
knitr::kable(data.deseq$genes[1:10,])

Run the code above in your browser using DataLab