Learn R Programming

BgeeDB (version 1.0.2)

loadTopAnatData: Retrieve data from Bgee to perform GO-like enrichment of anatomical terms, mapped to genes by expression patterns.

Description

This function loads a mapping from genes to anatomical structures based on calls of expression in anatomical structures. It also loads the structure of the anatomical ontology.

Usage

loadTopAnatData(species, datatype = c("rna_seq", "affymetrix", "est", "in_situ"), calltype = "expressed", confidence = "all", stage = NULL, host = "http://bgee.org", pathToData = getwd())

Arguments

species
A numeric indicating the NCBI taxonomic ID of the species to be used. The species has to be among species in Bgee v13, which include:
  • 6239 (Caenorhabditis elegans)
  • 7227 (Drosophila melanogaster)
  • 7955 (Danio rerio)
  • 8364 (Xenopus tropicalis)
  • 9031 (Gallus gallus)
  • 9258 (Ornithorhynchus anatinus)
  • 9544 (Macaca mulatta)
  • 9593 (Gorilla gorilla)
  • 9597 (Pan paniscus)
  • 9598 (Pan troglodytes)
  • 9606 (Homo sapiens)
  • 9823 (Sus scrofa)
  • 9913 (Bos taurus)
  • 10090 (Mus musculus)
  • 10116 (Rattus norvegicus)
  • 13616 (Monodelphis domestica)
  • 28377 (Anolis carolinensis)

See the listBgeeSpecies() function to get an up-to-date list of species.

datatype
A vector of characters indicating data type(s) to be used. To be chosen among:
  • "rna_seq"
  • "affymetrix"
  • "est"
  • "in_situ"

By default all data type are included: c("rna_seq","affymetrix","est","in_situ"). Including a data type that is not present in Bgee for a given species has no effect.

calltype
A character of indicating the type of expression calls to be used for enrichment. Only calls for significant presence of expression are implemented ("expressed"). Over-expression calls, based on differential expression analysis, will be implemented in the future.
confidence
A character indicating if only high quality expression calls should be retrieved. Options are "all" or "high_quality". Default is "all".
stage
A character indicating the targeted developmental stages for the analysis. Developmental stages can be chosen from the developmental stage ontology used in Bgee (available at https://github.com/obophenotype/developmental-stage-ontologies). If a stage ID is given, the expression pattern mapped to this stage and all children developmental stages (substages) will be retrieved. Default is NULL, meaning that expression patterns of genes are retrieved regardless of the stage of expression. This is equivalent to specifying stage="UBERON:0000104" (life cycle, the root of the stage ontology). The most useful stages (going no deeper than level 3 of the ontology) include:
  • UBERON:0000068 (embryo stage)
    • UBERON:0000106 (zygote stage)
    • UBERON:0000107 (cleavage stage)
    • UBERON:0000108 (blastula stage)
    • UBERON:0000109 (gastrula stage)
    • UBERON:0000110 (neurula stage)
    • UBERON:0000111 (organogenesis stage)
    • UBERON:0007220 (late embryonic stage)
    • UBERON:0004707 (pharyngula stage)

  • UBERON:0000092 (post-embryonic stage)
    • UBERON:0000069 (larval stage)
    • UBERON:0000070 (pupal stage)
    • UBERON:0000066 (fully formed stage)

host
URL to Bgee webservice. Change host to access development or archive versions of Bgee. Default is "http://bgee.org" to access current Bgee release.
pathToData
Path to the directory where the data files are stored / will be stored. Default is the working directory.

Value

A list of 3 elements:
  • A gene2anatomy list, mapping genes to anatomical structures based on expression calls.
  • A organ.names data frame, with the name corresonding to UBERON IDs.
  • A organ.relationships list, giving the relationships between anatomical structures in the UBERON ontology (based on parent-child "is_a" and "part_of" relationships).

Details

The expression calls come from Bgee (http://bgee.org), that integrates different expression data types (RNA-seq, Affymetrix microarray, ESTs, or in-situ hybridizations) in multiple animal species. Expression patterns are based exclusively on curated "normal", healthy, expression data (e.g., no gene knock-out, no treatment, no disease), to provide a reference of normal gene expression.

Anatomical structures are identified using IDs from the Uberon ontology (browsable at http://www.ontobee.org/ontology/UBERON). The mapping from genes to anatomical structures includes only the evidence of expression in these specific structures, and not the expression in their substructures (i.e., expression data are not propagated). The retrieval of propagated expression data will likely be implemented in the future, but meanwhile, it can be obtained using specialized packages such as topGO, see the topAnat.R function.

Examples

Run this code
{
  myTopAnatData <- loadTopAnatData(species = "10090", datatype = "rna_seq")
}

Run the code above in your browser using DataLab