Learn R Programming

caRpools (version 0.83)

stat.DESeq: Analysis: DESeq2 Analysis of pooled CRISPR NGS data

Description

For the DESeq2 analysis implementation, the read counts of all sgRNAs for a given gene are first summed up to increase the available read count. Then, DESeq2 analysis is perfomed, which includes the estimation of size-factors, the variance stabilization using a parametric fit and a Wald-Test for differnece in log2 fold changes between the untreated and treated data. More information about this can be found in _Love et al._ [Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2](http://www.ncbi.nlm.nih.gov/pubmed/25516281) _Genome Biology_ 2014

Usage

stat.DESeq(untreated.list,treated.list,namecolumn=1, fullmatchcolumn=2,
agg.function=sum, extractpattern=expression("^(.+?)_.+"), sorting=FALSE,
sgRNA.pval = 0.01, filename.deseq="data", fitType="parametric", p.adjust="holm")

Arguments

untreated.list
A list of data.frames of untreated, control samples. e.g. list(df.control1, df.control2)
treated.list
A list of data.frames of treated samples. e.g. list(df.treated1, df.treated2)
namecolumn
In which the target names are located, e.g. namecolumn=1 for the first columns.
fullmatchcolumn
Column, in which readcounts are located, e.g. fullmatchcolumn=2 for the second column.
agg.function
Function used to aggregate gene data from individual sgRNA data. By default, agg.function=mean, but it can be any other function e.g. sum or median.
extractpattern
Regular Expression, used to extract the gene name from the sgRNA name. Please make sure that the gene name extracted is accesible by putting its regular expression in brackets (). The default value expression("^(.+?)_.+") will look for the gene name (.+?)
sorting
Defines whether the final output is sorted by the calculated p-value. By default, sorting=FALSE will return a table sorted by gene name.
sgRNA.pval
p-value threshold to count significant sgRNAs for each gene. *Default* 0.001 *Value* (numeric)
filename.deseq
Filename of raw DESeq2 data output. *Default* "data" *Values* (character)
fitType
See `?DESeq2`. *Default* "parametric" *Values* "parametric", "local" "mean"
p.adjust
Method to adjust p-value for multiple testing. See `?DEseq2`. *Default* "holm" *Values* see `?DESeq2`

Value

  • stat.DESeq returns a formal class that contains gene names including the calculated p-value. The returned class can be visualized using carpools.hitident (see ?carpools.hitident). The output is formatted as follows:

    log2 fold change (MAP): condition untreated vs treated Wald test p-value: condition untreated vs treated DataFrame with 813 rows and 6 columns

    lcccccc{ baseMean log2FoldChange lfcSE stat pvalue padj AAK1 73.90565 -0.23319491 0.2927459 -0.7965779 0.42569619 0.7018234 AATK 159.43350 -0.11312924 0.2740927 -0.4127408 0.67979655 0.8514905 ABI1 131.03013 -0.09915855 0.2693971 -0.3680758 0.71281670 0.8691949 ABL1 77.51711 0.07837768 0.3155477 0.2483862 0.80383562 0.9114121 ABL2 119.22621 -0.49412039 0.2846396 -1.7359507 0.08257254 0.3128525 ... ... ... ... ... ... ... }

Details

none

Examples

Run this code
data(caRpools)
data.deseq = stat.DESeq(untreated.list = list(CONTROL1, CONTROL2),
  treated.list = list(TREAT1,TREAT2), namecolumn=1,
  fullmatchcolumn=2, extractpattern=expression("^(.+?)(_.+)"),
  sorting=FALSE, filename.deseq = "ANALYSIS-DESeq2-sgRNA.tab",
  fitType="parametric")
  
knitr::kable(data.deseq$genes[1:10,])

Run the code above in your browser using DataLab