Learn R Programming

avidaR: an R library to perform complex queries on a semantic database of digital organisms (avidaDB)

Introduction.

Digital evolution is a branch of artificial life in which self-replicating computer programs—digital organisms—mutate and evolve within a user-defined computational environment. In spite of its value in biology, we still lack an up-to-date and comprehensive database on digital organisms resulting from Avida evolution experiments. Therefore, we have developed an ontology-based semantic database—avidaDB—and an R package—avidaR—that provides users of the R programming language with an easy-to-use tool for performing complex queries without specific knowledge of SPARQL or RDF. avidaR can be used to do research on robustness, evolvability, complexity, phenotypic plasticity, gene regulatory networks, and genomic architecture by retrieving the genomes, phenotypes, and transcriptomes of more than a million digital organisms available on avidaDB. avidaR is already accepted on CRAN (i.e., a comprehensive collection of R packages contributed by the R community) and will make biologists better equipped to embrace the field of digital evolution. avidaR was developed by the Computational Biology Lab of the Doñana Biological Station (EBD), a research institute of the Spanish National Research Council (CSIC) based in Seville (Spain).

Installation.

avidaR depends on the following packages:

You can install avidaR from CRAN:

install.packages("avidaR")

or from our GitLab repository to get the latest version:

devtools::install_gitlab("fortunalab/avidaR")

Usage.

avidaR can be loaded as follows:

library("avidaR")

Connect to avidaDB.

avidaDB is a semantic database (or triple-store) on genomes and transcriptomes of more a million digital organisms stored as RDF data. It allows querying data using the SPARQL query language. The library avidaR can connect to triple-stores that support the RDF4J server REST API such as GraphDB. Since avidaDB is implemented in GraphDB, a basic connection (requiring no password or requiring basic HTTP user-pass authentication) or a connection secured with an API access token can be established.

avidaR provides a triplestore_access class to manage access options and retrieve data through the database server API. In order to get access to the entire database, you should first create the triplestore object and run the set_access_options() method as follows:

# create object of class triplestore_access
avidaDB <- triplestore_access$new()

# set access options to avidaDB
avidaDB$set_access_options(
    url = "https://graphdb.fortunalab.org",
    user = "public_avida",
    password = "public_avida",
    repository = "avidaDB"
  )

Get data from avidaDB.

The following function can be used to get the genome sequence of a single genome (e.g., genome_id = 1):

get_genome_seq_from_genome_id(genome_id = 1, triplestore = avidaDB)

or to get the genome sequences of multiple genomes at once:

get_genome_seq_from_genome_id(genome_id = c(1, 2, 3), triplestore = avidaDB)

Please, use the R help command to get more details about any specific function by writing the name of the function preceded by the symbol ?:

?get_genome_seq_from_genome_id

List of available functions grouped by the target entity:

Get the genome of a digital organism:

  • get_genome_id_from_logic_operation()
  • get_genome_id_from_phenotype_id()
  • get_genome_id_from_transcriptome_id()
  • get_genome_id_from_genome_seq()
  • get_genome_seq_from_genome_id()

Get the phenotype encoded by the genome of a digital organism:

  • get_phenotype_id_from_logic_operation()
  • get_phenotype_id_from_genome_id()
  • get_phenotype_id_from_genome_seq()
  • get_phenotype_id_from_transcriptome_id()

Get the logic operations (i.e., traits) defining the phenotype of a digital organism:

  • get_logic_operations_from_phenotype_id()

Get the transcriptome executed by a digital organism:

  • get_transcriptome_id_from_logic_operation()
  • get_transcriptome_id_from_genome_id()
  • get_transcriptome_id_from_genome_seq()
  • get_transcriptome_id_from_phenotype_id()
  • get_transcriptome_seq_from_transcriptome_id()

Get the tandem repeat contained in the transcriptome of a digital organism:

  • get_tandem_id_from_logic_operation()
  • get_tandem_id_from_genome_id()
  • get_tandem_id_from_genome_seq()
  • get_tandem_id_from_phenotype_id()
  • get_tandem_seq_from_tandem_id()

Get data provenance:

  • get_experiment_id_from_organism_id()
  • get_doi_from_experiment_id()
  • get_docker_image_from_experiment_id()

Miscellaneous functions:

  • get_db_summary()
  • instruction_set()
  • get_genome_id_of_wild_type_organisms()
  • get_mutant_at_pos()
  • convert_org_into_seq()
  • convert_seq_into_org()
  • plot_transcriptome()

Source code

avidaR was developed by Raúl Ortega.

Copy Link

Version

Install

install.packages('avidaR')

Monthly Downloads

235

Version

1.2.1

License

MIT + file LICENSE

Maintainer

Ra<c3><ba>l Ortega

Last Published

June 21st, 2024

Functions in avidaR (1.2.1)

get_tandem_seq_from_tandem_id

Get the tandem repeat sequence from tandem repeat
get_logic_operation_from_phenotype_id

Get the logic operations computed by a digital organism whose genome encodes a specific phenotype
get_transcriptome_id_from_genome_id

Get transcriptome from genome
triplestore_access

Class to manage triplestore access options
plot_transcriptome

Get a plot of the transcriptome as a chord diagram
get_tandem_id_from_logic_operation

Get tandem repeat from logic operations
get_mutant_at_pos

Get single-point mutants of wild-type organisms
get_tandem_id_from_phenotype_id

Get tandem repeat from phenotype
instruction_set

Get the genetic language of Avida
get_phenotype_id_from_genome_seq

Get phenotype from genome sequence
logic_operation

Get the list of logic operations that a digital organism can compute
get_transcriptome_id_from_genome_seq

Get transcriptome from genome sequence
get_transcriptome_id_from_logic_operation

Get transcriptome from logic operations
get_genome_seq_from_genome_id

Get genome sequence from genome
get_transcriptome_id_from_phenotype_id

Get transcriptome from phenotype
get_transcriptome_seq_from_transcriptome_id

Get transcriptome sequence from transcriptome
get_genome_id_from_genome_seq

Get genome from genome sequence
get_genome_id_from_phenotype_id

Get genome from phenotype
get_doi_from_experiment_id

Get doi from experiment
get_genome_id_from_logic_operation

Get genome from logic operations
get_tandem_id_from_genome_id

Get tandem repeat from genome
get_db_summary

Get database summary
get_docker_image_from_experiment_id

Get docker image from experiment
get_genome_id_from_transcriptome_id

Get genome from transcriptome
get_experiment_id_from_organism_id

Get experiment from organism
convert_org_into_seq

Converts a genome instruction sequence into a digital organism file
get_phenotype_id_from_logic_operation

Get phenotype from logic operations
convert_seq_into_org

Converts a genome instruction sequence into a digital organism file
get_phenotype_id_from_genome_id

Get phenotype from genome
get_genome_id_of_wild_type_organisms

Get genomes of wild-type organisms
get_phenotype_id_from_transcriptome_id

Get phenotype from transcriptome
get_tandem_id_from_genome_seq

Get tandem repeat from genome sequence