Learn R Programming

BatchQC: Batch Effects Quality Control

Sequencing and microarray samples often are collected or processed in multiple batches or at different times. This often produces technical biases that can lead to incorrect results in the downstream analysis. BatchQC is a software tool that streamlines batch preprocessing and evaluation by providing interactive diagnostics, visualizations, and statistical analyses to explore the extent to which batch variation impacts the data. BatchQC diagnostics help determine whether batch adjustment needs to be done, and how correction should be applied before proceeding with a downstream analysis. Moreover, BatchQCvinteractively applies multiple common batch effect approaches to the data, and the user can quickly see the benefits of each method. BatchQC is developed as a Shiny App. The output is organized into multiple tabs, and each tab features an important part of the batch effect analysis and visualization of the data. The BatchQC interface has the following analysis groups: Summary, Differential Expression, Median Correlations, Heatmaps, Circular Dendrogram, PCA Analysis, Shape, ComBat and SVA.

The package includes:

  1. Summary and Sample Diagnostics
  2. Differential Expression Plots and Analysis using LIMMA
  3. Principal Component Analysis and plots to check batch effects
  4. Heatmap plot of gene expressions
  5. Median Correlation Plot
  6. Circular Dendrogram clustered and colored by batch and condition
  7. Shape Analysis for the distribution curve based on HTShape package
  8. Batch Adjustment using ComBat
  9. Surrogate Variable Analysis using sva package
  10. Function to generate simulated RNA-Seq data

batchQC is the pipeline function that generates the BatchQC report and launches the Shiny App in interactive mode. It combines all the functions into one step.

Installation

To begin, install Bioconductor and simply run the following to automatically install BatchQC and all the dependencies, except pandoc, which you have to manually install as follows.

source("http://bioconductor.org/biocLite.R")
biocLite("BatchQC")

Install 'pandoc' package by following the instructions at the following URL: http://pandoc.org/installing.html

Rstudio also provides pandoc binaries at the following location for Windows, Linux and Mac: https://s3.amazonaws.com/rstudio-buildtools/pandoc-1.13.1.zip

If all went well you should now be able to load BatchQC:

require(BatchQC)
browseVignettes("BatchQC")
vignette('BatchQCIntro', package='BatchQC')
vignette('BatchQC_examples', package='BatchQC')
vignette('BatchQC_usage_advanced', package='BatchQC')

If you want to install the latest development version of BatchQC from Github, Use devtools to install it as follows:

require(devtools)
install_github("mani2012/BatchQC", build_vignettes=TRUE)

If you want to manually install the BatchQC dependencies, run the following:

source("http://bioconductor.org/biocLite.R")
biocLite(c('MCMCpack', 'limma', 'preprocessCore', 'sva', 'devtools', 'corpcor', 
'matrixStats', 'shiny', 'ggvis', 'd3heatmap', 'reshape2', 'moments', 
'rmarkdown', 'pander', 'gplots'))

Troubleshooting with Installation

Please make sure you have installed pandoc by following the instructions from http://pandoc.org/installing.html. Otherwise, you may get an error such as the following:

* creating vignettes ... ERROR
Warning in engine$weave(file, quiet = quiet, encoding = enc) :
  Pandoc (>= 1.12.3) and/or pandoc-citeproc is not available. Please install 
  both.
Error: processing vignette 'BatchQCIntro.Rmd' failed with diagnostics:
It seems you should call rmarkdown::render() instead of knitr::knit2html() 
because BatchQCIntro.Rmd appears to be an R Markdown v2 document.
Execution halted
Error: Command failed (1)

For generating pdf vignettes in Linux, you need to install texlive and lmodern as follows:

sudo apt-get install texlive
sudo apt-get install lmodern

If you do not have permissions to install in the default location for R, you may have to setup local directory. You may also want to load a version of R 3.3.0 or higher.

export R_LIBS="/my_own_local_directory/R_libs"
module load R/R-3.3.0

And do something like the following

install.packages("devtools", repos="http://cran.r-project.org", lib="/my_own_local_directory/R_libs")

Copy Link

Version

Version

1.0.17

License

GPL (>= 2)

Issues

Pull Requests

Stars

Forks

Maintainer

Solaiappan Manimaran

Last Published

February 15th, 2017

Functions in BatchQC (1.0.17)

batchqc_corscatter

Produce Median Correlation plot
getShinyInputSVAf

Getter function to get the shinyInputSVAf option
pcRes

Compute variance of each principal component and how they correlate with batch and cond
plotPC

Plot first 2 principal components
batchQC_filter_genes

Returns a datset after filtering genes of zero variance across batch and condition combinations
batchQC_fsva_adjusted

Use frozen surrogate variable analysis to remove the surrogate variables inferred from sva
batchQC

Checks for presence of batch effect and creates a html report with information including whether the batch needs to be adjusted
BatchQCout-class

The BatchQC output class to output BatchQC results
getShinyInput

Getter function to get the shinyInput option
setShinyInputOrig

Setter function to set the shinyInputOrig option
example_batchqc_data

Batch and Condition indicator for signature data captured when activating different growth pathway genes in human mammary epithelial cells.
setShinyInputSVA

Setter function to set the shinyInputSVA option
batchQC_svregress_adjusted

Regress the surrogate variables out of the expression data
batchQC_sva

Estimate the surrogate variables using the 2 step approach proposed by Leek and Storey 2007
getShinyInputSVAr

Getter function to get the shinyInputSVAr option
gnormalize

Perform Genewise Normalization of the given data matrix
batchqc_heatmap

Produce heatmap plots for the given data
batchQC_num.sv

Returns the number of surrogate variables to estimate in the model using a permutation based procedure
combatPlot

Adjust for batch effects using an empirical Bayes framework ComBat allows users to adjust for batch effects in datasets where the batch covariate is known, using methodology described in Johnson et al. 2007. It uses either parametric or non-parametric empirical Bayes frameworks for adjusting data for batch effects. Users are returned an expression matrix that has been corrected for batch effects. The input data are assumed to be cleaned and normalized before batch effect removal.
batchtest

Performs test to check whether batch needs to be adjusted
batchqc_circosplot

Produce Circos plot
batchQC_analyze

Checks for presence of batch effect and reports whether the batch needs to be adjusted
batchqc_pca_svd

Performs PCA svd variance decomposition and produces plot of the first two principal components
batchqc_pc_explained_variation

Returns explained variation for each principal components
getShinyInputCombat

Getter function to get the shinyInputCombat option
batchqc_correlation

Produce correlation heatmap plot
batchQC_shapeVariation

Perform Mean and Variance batch variation analysis
batchQC_condition_adjusted

Returns adjusted data after remove the variation across conditions
batchqc_pca

Performs principal component analysis and produces plot of the first two principal components
log2CPM

Compute log2(counts per mil reads) and library size for each sample
makeSVD

Compute singular value decomposition
setShinyInput

Setter function to set the shinyInput option
setShinyInputCombat

Setter function to set the shinyInputCombat option
setShinyInputSVAf

Setter function to set the shinyInputSVAf option
setShinyInputSVAr

Setter function to set the shinyInputSVAr option
batchqc_explained_variation

Returns a list of explained variation by batch and condition combinations
getShinyInputSVA

Getter function to get the shinyInputSVA option
getShinyInputOrig

Getter function to get the shinyInputOrig option
protein_example_data

Batch and Condition indicator for protein expression data
rnaseq_sim

Generate simulated count data with batch effects for ngenes