bootstrap.enrichment.test: Bootstrap celltype enrichment test

Description

bootstrap.enrichment.test takes a genelist and a single cell type transcriptome dataset and determines the probability of enrichment and fold changes for each cell type.

Usage

bootstrap.enrichment.test(sct_data = NA, mouse.hits = NA, mouse.bg = NA, human.hits = NA, human.bg = NA, reps = 100, sub = FALSE, geneSizeControl = FALSE)

Arguments

sct_data

List generated using read_celltype_data

mouse.hits

Array of MGI gene symbols containing the target gene list. Not required if geneSizeControl=TRUE

mouse.bg

Array of MGI gene symbols containing the background gene list. Not required if geneSizeControl=TRUE

human.hits

Array of HGNC gene symbols containing the target gene list. Not required if geneSizeControl=FALSE

human.bg

Array of HGNC gene symbols containing the background gene list. Not required if geneSizeControl=FALSE

reps

Number of random gene lists to generate (default=100 but should be over 10000 for publication quality results)

sub

a logical indicating whether to analyse sub-cell type annotations (TRUE) or cell-type annotations (FALSE). Default is FALSE.

geneSizeControl

a logical indicating whether you want to control for GC content and transcript length. Recommended if the gene list originates from genetic studies. Default is FALSE. If set to TRUE then human gene lists should be used rather than mouse.

Value

A list containing three data frames:

results: dataframe in which each row gives the statistics (p-value, fold change and number of standard deviations from the mean) associated with the enrichment of the stated cell type in the gene list
hit.cells: vector containing the summed proportion of expression in each cell type for the target list
bootstrap_data: matrix in which each row represents the summed proportion of expression in each cell type for one of the random lists

Examples

Run this code

# Load the single cell data
data(celltype_data)

# Set the parameters for the analysis
reps=100 # <- Use 100 bootstrap lists so it runs quickly, for publishable analysis use >10000
subCellStatus=0 # <- Use subcell level annotations (i.e. Interneuron type 3)
if(subCellStatus==1){subCellStatus=TRUE;cellTag="SubCells"}
if(subCellStatus==0){subCellStatus=FALSE;cellTag="FullCells"}

# Load the gene list and get human orthologs
data("example_genelist")
data("mouse_to_human_homologs")
m2h = unique(mouse_to_human_homologs[,c("HGNC.symbol","MGI.symbol")])
mouse.hits = unique(m2h[m2h$HGNC.symbol %in% example_genelist,"MGI.symbol"])
human.hits = unique(m2h[m2h$HGNC.symbol %in% example_genelist,"HGNC.symbol"])
human.bg = unique(setdiff(m2h$HGNC.symbol,human.hits))
mouse.bg  = unique(setdiff(m2h$MGI.symbol,mouse.hits))

# Bootstrap significance testing, without controlling for transcript length and GC content
full_results = bootstrap.enrichment.test(sct_data=celltype_data,mouse.hits=mouse.hits,
  mouse.bg=mouse.bg,reps=reps,sub=subCellStatus)

# Bootstrap significance testing controlling for transcript length and GC content
full_results = bootstrap.enrichment.test(sct_data=celltype_data,human.hits=human.hits,
  human.bg=human.bg,reps=reps,sub=subCellStatus,geneSizeControl=TRUE)

Run the code above in your browser using DataLab