Adjust for batch effects using an empirical Bayes framework
ComBat allows users to adjust for batch effects in datasets where the batch covariate is known, using methodology described in Johnson et al. 2007. It uses either parametric or non-parametric empirical Bayes frameworks for adjusting data for batch effects. Users are returned an expression matrix that has been corrected for batch effects. The input data are assumed to be cleaned and normalized before batch effect removal.
ComBat(dat, batch, mod=NULL, par.prior = TRUE, prior.plots = FALSE)
- Genomic measure matrix (dimensions probe x sample) - for example, expression matrix
- Batch covariate (multiple batches are not allowed)
- Model matrix for outcome of interest and other covariates besides batch
- (Optional) TRUE indicates parametric adjustments will be used, FALSE indicates non-parametric adjustments will be used
- (Optional)TRUE give prior plots with black as a kernel estimate of the empirical batch effect density and red as the parametric
- (Optional)FALSE If TRUE ComBat only corrects the mean of the batch effect (no scale adjustment)
data A probe x sample genomic measure matrix, adjusted for batch effects.
## Correction of Batch Effects in Proteomics Data Using Combat *This is an excerpt from some code I used to prepare some proteomics data for hierarchical cluster analysis; the data was showing strong grouping tendencies associated with two separated batch preparations / mass spec analyses of the samples. "Ion counts" in this context is approximately analogous to e.g. expression level in an RNA context.* ### Prepare data ComBat requires two data types: - a metadata `data.frame` - ion counts data in a `matrix` For the **metadata** `data.frame`, we simply need: - a column of samples - a column enumerating to which batch they belong (only two in this case) The two separate batches were distinguishable by whether or not the sample name contained the pattern 'bis': ``` cb.df.mdata <- cbind.data.frame("sample" = colnames(df.sdat.avgd.cleaned[, -c(1)]), # exclude uid column, c(1) "batch" = ifelse(grepl('bis', colnames(df.sdat.avgd.cleaned[, -c(1)])), 'batch_A', 'batch_B'))) ``` For the **sample data** (ion counts, in this case) `matrix`, the format is: - features in rows - samples in columns Convert to matrix, sample ID in first column c(1): ``` cb.mtx.sdata <- as.matrix(df.sdat.avgd.cleaned[, -c(1)]) rownames(cb.mtx.sdata) <- df.sdat.avgd.cleaned$uid ``` ### Create Model & Apply the ComBat Algorithm In this case I am only correcting for the batch effects; however, see documentation for further explanation of how to define the model. ``` cb.corr.model <- model.matrix(~1, data = cb.df.mdata) cb.corr.counts = ComBat(dat=cb.mtx.sdata, batch=cb.df.mdata$batch, mod=cb.corr.model, par.prior=TRUE, prior.plot=FALSE) ```