Learn R Programming

⚠️There's a newer version (1.0.7) of this package.Take me there.

biomartr

Genomic Data Retrieval with R

Motivation:

This package is born out of my own frustration to automate the genomic data retrieval process to create computationally reproducible scripts for large-scale genomics studies. Since I couldn't find easy-to-use and fully reproducible software libraries that would allow others and me to write transparent and easy to reproduce code, I sat down and tried to implement a framework that would enable anyone to automate the genomic data retrieval process. Personally, I strongly support and believe in reproducible research, and I truly hope that this package might be useful to others as well and that it helps to promote reproducible research in genomics studies.

I happily welcome anyone who wishes to contribute to this project :) Just drop me an email.

Short package description:

The vastly growing number of sequenced genomes allows us to perform a new type of biological research. Using a comparative approach these genomes provide us with new insights on how biological information is encoded on the molecular level and how this information changes over evolutionary time.

The first step, however, of any genome based study is to retrieve genomes from databases. To automate the retrieval process on a meta-genomic scale, the biomartr package provides useful interface functions for genomic sequence retrieval and functional annotation retrieval. The major aim of biomartr is to facilitate computational reproducibility and large-scale handling of genomic data for (meta-)genomic analyses.

In detail, biomartr aims to provide users with an easy to use framework to obtain genome, proteome, CDS, GFF (annotation), genome assembly quality, and metagenome project data. Furthermore, an interface to the Ensembl Biomart database allows users to retrieve functional annotation for genomic loci. Users can download entire databases such as NCBI RefSeq, NCBI nr, NCBI nt, NCBI Genbank, etc. as well as ENSEMBL and ENSEMBLGENOMES with only one command.

Hence, the biomartr package is designed to achieve the highest degree of computational reproducibility in genomics research.

Citation

Please cite the following paper when using biomartr for your own research. This will allow me to continue working on this software tool and will motivate me to extend its functionality and usability in the next years. Many thanks in advance :)

Drost HG, Paszkowski J. Biomartr: genomic data retrieval with R. Bioinformatics (2017) 33(8): 1216-1217. doi:10.1093/bioinformatics/btw821.

Platforms

Find biomartr also at OmicTools.

Frequently Asked Questions (FAQs)

Please find all FAQs here.

Discussions and Bug Reports

I would be very happy to learn more about potential improvements of the concepts and functions provided in this package.

Furthermore, in case you find some bugs or need additional (more flexible) functionality of parts of this package, please let me know:

twitter: HajkDrost or email

For Bug Reports: Please send me an issue.

Tutorials

Getting Started with biomartr:

Users can also read the tutorials within (RStudio) :

# source the biomartr package
library(biomartr)

# look for all tutorials (vignettes) available in the biomartr package
# this will open your web browser
browseVignettes("biomartr")

Installation

# install biomartr 0.5.1
source("http://bioconductor.org/biocLite.R")
biocLite('biomartr')

Install Developer Version

Some bug fixes or new functionality will not be available on CRAN yet, but in the developer version here on GitHub. To download and install the most recent version of biomartr run:

# install the current version of biomartr on your system
source("http://bioconductor.org/biocLite.R")
biocLite("HajkD/biomartr")

NEWS

The current status of the package as well as a detailed history of the functionality of each version of biomartr can be found in the NEWS section.

Genomic Data Retrieval

Meta-Genome Retrieval

  • meta.retrieval() : Perform Meta-Genome Retieval from NCBI of species belonging to the same kingdom of life or to the same taxonomic subgroup
  • meta.retrieval.all() : Perform Meta-Genome Retieval from NCBI of the entire kingdom of life
  • getMetaGenomes() : Retrieve metagenomes from NCBI Genbank
  • getMetaGenomeAnnotations() : Retrieve annotation *.gff files for metagenomes from NCBI Genbank
  • listMetaGenomes() : List available metagenomes on NCBI Genbank
  • getMetaGenomeSummary() : Helper function to retrieve the assembly_summary.txt file from NCBI genbank metagenomes

Genome Retrieval

  • listGenomes() : List all genomes available on NCBI and ENSEMBL servers
  • listKingdoms() : list the number of available species per kingdom of life on NCBI and ENSEMBL servers
  • listGroups() : list the number of available species per group on NCBI and ENSEMBL servers
  • getKingdoms() : Retrieve available kingdoms of life
  • getGroups() : Retrieve available groups for a kingdom of life
  • is.genome.available() : Check Genome Availability NCBI and ENSEMBL servers
  • getGenome() : Download a specific genome stored on NCBI and ENSEMBL servers
  • getProteome() : Download a specific proteome stored on NCBI and ENSEMBL servers
  • getCDS() : Download a specific CDS file (genome) stored on NCBI and ENSEMBL servers
  • getRNA() : Download a specific RNA file stored on NCBI and ENSEMBL servers
  • getGFF() : Genome Annotation Retrieval from NCBI (*.gff) and ENSEMBL (*.gff3) servers
  • getGTF() : Genome Annotation Retrieval (*.gtf) from ENSEMBL servers
  • getRepeatMasker() : Repeat Masker TE Annotation Retrieval
  • getAssemblyStats() : Genome Assembly Stats Retrieval from NCBI
  • getKingdomAssemblySummary() : Helper function to retrieve the assembly_summary.txt files from NCBI for all kingdoms
  • getMetaGenomeSummary() : Helper function to retrieve the assembly_summary.txt files from NCBI genbank metagenomes
  • getSummaryFile() : Helper function to retrieve the assembly_summary.txt file from NCBI for a specific kingdom
  • getENSEMBLInfo() : Retrieve ENSEMBL info file
  • getGENOMEREPORT() : Retrieve GENOME_REPORTS file from NCBI

Import Downloaded Files

  • read_genome() : Import genomes as Biostrings or data.table object
  • read_proteome() : Import proteome as Biostrings or data.table object
  • read_cds() : Import CDS as Biostrings or data.table object
  • read_gff() : Import GFF file
  • read_rna() : Import RNA file
  • read_rm() : Import Repeat Masker output file
  • read_assemblystats() : Import Genome Assembly Stats File

Database Retrieval

  • listNCBIDatabases() : Retrieve a List of Available NCBI Databases for Download
  • download.database() : Download a NCBI database to your local hard drive
  • download.database.all() : Download a complete NCBI Database such as e.g. NCBI nr to your local hard drive

BioMart Queries

  • biomart() : Main function to query the BioMart database
  • getMarts() : Retrieve All Available BioMart Databases
  • getDatasets() : Retrieve All Available Datasets for a BioMart Database
  • getAttributes() : Retrieve All Available Attributes for a Specific Dataset
  • getFilters() : Retrieve All Available Filters for a Specific Dataset
  • organismBM() : Function for organism specific retrieval of available BioMart marts and datasets
  • organismAttributes() : Function for organism specific retrieval of available BioMart attributes
  • organismFilters() : Function for organism specific retrieval of available BioMart filters

Performing Gene Ontology queries

Gene Ontology

  • getGO() : Function to retrieve GO terms for a given set of genes

Download Developer Version On Windows Systems

# On Windows, this won't work - see ?build_github_devtools
install_github("HajkD/biomartr", build_vignettes = TRUE, dependencies = TRUE)

# When working with Windows, first you need to install the
# R package: rtools -> install.packages("rtools")

# Afterwards you can install devtools -> install.packages("devtools")
# and then you can run:

devtools::install_github("HajkD/biomartr", build_vignettes = TRUE, dependencies = TRUE)

# and then call it from the library
library("biomartr", lib.loc = "C:/Program Files/R/R-3.1.1/library")

Troubleshooting on Windows Machines

  • Install biomartr on a Win 8 laptop: solution ( Thanks to Andres Romanowski )

Code of conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Copy Link

Version

Install

install.packages('biomartr')

Monthly Downloads

1,407

Version

0.5.9000

License

GPL-2

Issues

Pull Requests

Stars

Forks

Maintainer

Hajk-Georg Drost

Last Published

December 2nd, 2023

Functions in biomartr (0.5.9000)

download.database

Download a NCBI Database to Your Local Hard Drive
is.genome.available

Check Genome Availability
getKingdomAssemblySummary

Retrieve and summarise the assembly_summary.txt files from NCBI for all kingdoms
biomart

Main BioMart Query Function
getMetaGenomes

Retrieve metagenomes from NCBI Genbank
listDatabases

Retrieve a List of Available NCBI Databases for Download
getProteome

Proteome Retrieval
getGTF

Genome Annotation Retrieval (GTF)
getGenome

Genome Retrieval
biomartr-package

Genomic Data Retrieval
organismFilters

Retrieve Ensembl Biomart filters for a qyery organism
getRepeatMasker

Repeat Masker Retrieval
organismAttributes

Retrieve Ensembl Biomart attributes for a query organism
getSummaryFile

Helper function to retrieve the assembly_summary.txt file from NCBI
read_assemblystats

Import Genome Assembly Stats File
listGenomes

List All Available Genomes
organismBM

Retrieve Ensembl Biomart marts and datasets for a query organism
listGroups

List number of available genomes in each group
read_cds

Import CDS as Biostrings or data.table object
getCDS

Coding Sequence Retrieval
getDatasets

Retrieve All Available Datasets for a BioMart Database
read_gff

Import GFF File
read_genome

Import Genome Assembly as Biostrings or data.table object
refseqOrganisms

Retrieve All Organism Names Stored on refseq
read_proteome

Import Proteome as Biostrings or data.table object
getMetaGenomeAnnotations

Retrieve annotation *.gff files for metagenomes from NCBI Genbank
getMetaGenomeSummary

Retrieve the assembly_summary.txt file from NCBI genbank metagenomes
read_rm

Import Repeat Masker output file
read_rna

Import RNA as Biostrings or data.table object
getAttributes

Retrieve All Available Attributes for a Specific Dataset
getAssemblyStats

Genome Assembly Stats Retrieval
getGFF

Genome Annotation Retrieval (GFF3)
getGO

Gene Ontology Query
listKingdoms

List number of available genomes in each kingdom of life
listMetaGenomes

List available metagenomes on NCBI Genbank
getENSEMBLInfo

Retrieve ENSEMBL info file
getENSEMBLGENOMESInfo

Retrieve ENSEMBLGENOMES info file
getMarts

Retrieve information about available Ensembl Biomart databases
getKingdoms

Retrieve available kingdoms of life
getRNA

RNA Sequence Retrieval
getReleases

Retrieve available database releases or versions of ENSEMBL and ENSEMBLGENOMES
meta.retrieval

Perform Meta-Genome Retieval
meta.retrieval.all

Perform Meta-Genome Retieval of all organisms in all kingdoms of life
download.database.all

Download all elements of an NCBI databse
getGroups

Retrieve available groups for a kingdom of life
getFilters

Retrieve All Available Filters for a Specific Dataset
getGENOMEREPORT

Retrieve NCBI GENOME_REPORTS file