Learn R Programming

MetaQC (version 0.1.10-1)

MetaQC: MetaQC: Objective Quality Control and Inclusion/Exclusion Criteria for Genomic Meta-Analysis

Description

MetaQC implements our proposed quantitative quality control measures: (1) internal homogeneity of co-expression structure among studies (internal quality control; IQC); (2) external consistency of co-expression structure correlating with pathway database (external quality control; EQC); (3) accuracy of differentially expressed gene detection (accuracy quality control; AQCg) or pathway identification (AQCp); (4) consistency of differential expression ranking in genes (consistency quality control; CQCg) or pathways (CQCp). (See the reference for detailed explanation.) For each quality control index, the p-values from statistical hypothesis testing are minus log transformed and PCA biplots were applied to assist visualization and decision. Results generate systematic suggestions to exclude problematic studies in microarray meta-analysis and potentially can be extended to GWAS or other types of genomic meta-analysis. The identified problematic studies can be scrutinized to identify technical and biological causes (e.g. sample size, platform, tissue collection, preprocessing etc) of their bad quality or irreproducibility for final inclusion/exclusion decision.

Usage

MetaQC(DList, GList, isParallel = FALSE, nCores = NULL, 
	useCache = TRUE, filterGenes = TRUE, 
	maxNApctAllowed=.3, cutRatioByMean=.4, cutRatioByVar=.4, minNumGenes=5,
	verbose = FALSE, resp.type = c("Twoclass", "Multiclass", "Survival"))

Arguments

DList
Either a list of all data matrices (Case 1) or a list of lists (Case 2); The first case is simplified input data structure only for two classes comparison. Each data name should be set as the name of each list element. Each data should be a numeric matrix
GList
The location of a file which has sets of gene symbol lists such as gmt files. By default, the gmt file will be converted to list object and saved with the same name with ".rda". Alternatively, a list of gene sets is allowed; the name of each element of th
isParallel
Whether to use multiple cores in parallel for fast computing. By default, it is false.
nCores
When isParallel is true, the number of cores can be set. By default, all cores in the machine are used in the unix-like machine, and 2 cores are used in windows.
useCache
Whether imported gmt file should be saved for the next use. By default, it is true.
filterGenes
Whether to use gene filtering (recommended).
maxNApctAllowed
Filtering out genes which have missing values more than specified ratio (Default .3). Applied if filterGenes is TRUE.
cutRatioByMean
Filtering out specified ratio of genes which have least expression value (Default .4). Applied if filterGenes is TRUE.
cutRatioByVar
Filtering out specified ratio of genes which have least sample wise expression variance (Default .4). Applied if filterGenes is TRUE.
minNumGenes
Mininum number of genes in a pathway. A pathway which has members smaller than the specified value will be removed.
verbose
Whether to print out logs.
resp.type
The type of response variable. Three options are: "Twoclass" (unpaired), "Multiclass", "Survival." By default, Twoclass is used

Value

  • A proto R object. Use RunQC function to run QC procedure. Use Plot function to plot PCA figure. Use Print function to view various information. See examples below.

References

Dongwan D. Kang, Etienne Sibille, Naftali Kaminski, and George C. Tseng. (Nucleic Acids Res. 2012) MetaQC: Objective Quality Control and Inclusion/Exclusion Criteria for Genomic Meta-Analysis.

See Also

runQC

Examples

Run this code
requireAll(c("proto", "foreach"))

   ## Toy Example
    data(brain) #already hugely filtered
    #Two default gmt files are automatically downloaded, 
	#otherwise it is required to locate it correctly.
    #Refer to http://www.broadinstitute.org/gsea/downloads.jsp
    brainQC <- MetaQC(brain, "c2.cp.biocarta.v3.0.symbols.gmt", 
						filterGenes=FALSE, verbose=TRUE)
	#B is recommended to be >= 1e4 in real application					
    runQC(brainQC, B=1e2, fileForCQCp="c2.all.v3.0.symbols.gmt")
    brainQC
    plot(brainQC)

    ## For parallel computation with only 2 cores
	## R >= 2.11.0 in windows to use parallel computing
    brainQC <- MetaQC(brain, "c2.cp.biocarta.v3.0.symbols.gmt", 
			filterGenes=FALSE, verbose=TRUE, isParallel=TRUE, nCores=2)
    #B is recommended to be >= 1e4 in real application
    runQC(brainQC, B=1e2, fileForCQCp="c2.all.v3.0.symbols.gmt")
    plot(brainQC)

    ## For parallel computation with all cores
	## In windows, only 2 cores are used if not specified explicitly
    brainQC <- MetaQC(brain, "c2.cp.biocarta.v3.0.symbols.gmt", 
			filterGenes=FALSE, verbose=TRUE, isParallel=TRUE)
	#B is recommended to be >= 1e4 in real application					
    runQC(brainQC, B=1e2, fileForCQCp="c2.all.v3.0.symbols.gmt")
    plot(brainQC)

	## Real Example which is used in the paper
	#download the brainFull file 
	#from https://github.com/downloads/donkang75/MetaQC/brainFull.rda
	load("brainFull.rda")
    brainQC <- MetaQC(brainFull, "c2.cp.biocarta.v3.0.symbols.gmt", filterGenes=TRUE, 
			verbose=TRUE, isParallel=TRUE)
    runQC(brainQC, B=1e4, fileForCQCp="c2.all.v3.0.symbols.gmt") #B was 1e5 in the paper
    plot(brainQC)
    
	## Survival Data Example
	#download Breast data 
	#from https://github.com/downloads/donkang75/MetaQC/Breast.rda
	load("Breast.rda")
    breastQC <- MetaQC(Breast, "c2.cp.biocarta.v3.0.symbols.gmt", filterGenes=FALSE, 
			verbose=TRUE, isParallel=TRUE, resp.type="Survival")
    runQC(breastQC, B=1e4, fileForCQCp="c2.all.v3.0.symbols.gmt") 
    breastQC
    plot(breastQC)

Run the code above in your browser using DataLab