GenomicFeatures (version 1.18.7)

makeTranscriptDbFromBiomart: Make a TxDb object from annotations available on a BioMart database

Description

The makeTranscriptDbFromBiomart function allows the user to make a TxDb object from transcript annotations available on a BioMart database.

Usage

makeTranscriptDbFromBiomart(biomart="ensembl", dataset="hsapiens_gene_ensembl", transcript_ids=NULL, circ_seqs=DEFAULT_CIRC_SEQS, filters="", id_prefix="ensembl_", host="www.biomart.org", port=80, miRBaseBuild=NA)
getChromInfoFromBiomart(biomart="ensembl", dataset="hsapiens_gene_ensembl", id_prefix="ensembl_", host="www.biomart.org", port=80)

Arguments

biomart
which BioMart database to use. Get the list of all available BioMart databases with the listMarts function from the biomaRt package. See the details section below for a list of BioMart databases with compatible transcript annotations.
dataset
which dataset from BioMart. For example: "hsapiens_gene_ensembl", "mmusculus_gene_ensembl", "dmelanogaster_gene_ensembl", "celegans_gene_ensembl", "scerevisiae_gene_ensembl", etc in the ensembl database. See the examples section below for how to discover which datasets are available in a given BioMart database.
transcript_ids
optionally, only retrieve transcript annotation data for the specified set of transcript ids. If this is used, then the meta information displayed for the resulting TxDb object will say 'Full dataset: no'. Otherwise it will say 'Full dataset: yes'.
circ_seqs
a character vector to list out which chromosomes should be marked as circular.
filters
Additional filters to use in the BioMart query. Must be a named list. An example is filters=as.list(c(source="entrez"))
host
The host URL of the BioMart. Defaults to www.biomart.org.
port
The port to use in the HTTP communication with the host.
id_prefix
Specifies the prefix used in BioMart attributes. For example, some BioMarts may have an attribute specified as "ensembl_transcript_id" whereas others have the same attribute specified as "transcript_id". Defaults to "ensembl_".
miRBaseBuild
specify the string for the appropriate build Information from mirbase.db to use for microRNAs. This can be learned by calling supportedMiRBaseBuildValues. By default, this value will be set to NA, which will inactivate the microRNAs accessor.

Value

TxDb object.

Details

makeTranscriptDbFromBiomart is a convenience function that feeds data from a BioMart database to the lower level makeTranscriptDb function. See ?makeTranscriptDbFromUCSC for a similar function that feeds data from the UCSC source.

The listMarts function from the biomaRt package can be used to list all public BioMart databases. Not all databases returned by this function contain datasets that are compatible with (i.e. understood by) makeTranscriptDbFromBiomart. Here is a list of datasets known to be compatible (updated on Sep 24, 2014):

  • All the datasets in the main Ensembl database: use biomart="ensembl".

  • All the datasets in the Ensembl Fungi database: use biomart="fungi_mart_XX" where XX is the release version of the database e.g. "fungi_mart_22".
  • All the datasets in the Ensembl Metazoa database: use biomart="metazoa_mart_XX" where XX is the release version of the database e.g. "metazoa_mart_22".
  • All the datasets in the Ensembl Plants database: use biomart="plants_mart_XX" where XX is the release version of the database e.g. "plants_mart_22".
  • All the datasets in the Ensembl Protists database: use biomart="protists_mart_XX" where XX is the release version of the database e.g. "protists_mart_22".
  • All the datasets in the Gramene Mart: use biomart="ENSEMBL_MART_PLANT".
  • Not all these datasets have CDS information.

    See Also

    listMarts, useMart, listDatasets, DEFAULT_CIRC_SEQS, makeTranscriptDbFromUCSC, makeTranscriptDbFromGFF, makeTranscriptDb, supportedMiRBaseBuildValues

    Examples

    Run this code
    ## Discover which datasets are available in the "ensembl" BioMart
    ## database:
    library("biomaRt")
    head(listDatasets(useMart("ensembl")))
    
    ## Retrieving an incomplete transcript dataset for Human from the
    ## "ensembl" BioMart database:
    transcript_ids <- c(
        "ENST00000013894",
        "ENST00000268655",
        "ENST00000313243",
        "ENST00000435657",
        "ENST00000384428",
        "ENST00000478783"
    )
    txdb <- makeTranscriptDbFromBiomart(transcript_ids=transcript_ids)
    txdb  # note that these annotations match the GRCh37 genome assembly
    
    ## Now what if we want to use another mirror?  We might make use of the
    ## new host argument.  But wait!  If we use biomaRt, we can see that
    ## this host has named the mart differently!
    listMarts(host="uswest.ensembl.org")
    ## Therefore we must also change the name passed into the "mart"
    ## argument thusly:
    try(
        txdb <- makeTranscriptDbFromBiomart(biomart="ENSEMBL_MART_ENSEMBL",
                                            transcript_ids=transcript_ids,
                                            host="uswest.ensembl.org")	    
    )
    txdb
    

    Run the code above in your browser using DataLab