Learn R Programming

gggenomes

A grammar of graphics for comparative genomics

gggenomes is a versatile graphics package for comparative genomics. It extends the popular R visualization package ggplot2 by adding dedicated plot functions for genes, syntenic regions, etc. and verbs to manipulate the plot to, for example, quickly zoom in into gene neighborhoods.

A realistic use case comparing six viral genomes

gggenomes makes it easy to combine data and annotations from different sources into one comprehensive and elegant plot. Here we compare the genomic architecture of 6 viral genomes initially described in Hackl et al.: Endogenous virophages populate the genomes of a marine heterotrophic flagellate

library(gggenomes)

# to inspect the example data shipped with gggenomes
data(package="gggenomes")

gggenomes(
  genes = emale_genes, seqs = emale_seqs, links = emale_ava,
  feats = list(emale_tirs, ngaros=emale_ngaros, gc=emale_gc)) |> 
  add_sublinks(emale_prot_ava) |>
  sync() + # synchronize genome directions based on links
  geom_feat(position="identity", size=6) +
  geom_seq() +
  geom_link(data=links(2)) +
  geom_bin_label() +
  geom_gene(aes(fill=name)) +
  geom_gene_tag(aes(label=name), nudge_y=0.1, check_overlap = TRUE) +
  geom_feat(data=feats(ngaros), alpha=.3, size=10, position="identity") +
  geom_feat_note(aes(label="Ngaro-transposon"), data=feats(ngaros),
      nudge_y=.1, vjust=0) +
  geom_wiggle(aes(z=score, linetype="GC-content"), feats(gc),
      fill="lavenderblush4", position=position_nudge(y=-.2), height = .2) +
  scale_fill_brewer("Genes", palette="Dark2", na.value="cornsilk3")
  
ggsave("emales.png", width=8, height=4)

For a reproducible recipe describing the full evolution of an earlier version of this plot with an older version of gggenomes starting from a mere set of contigs, and including the bioinformatics analysis workflow, have a look at From a few sequences to a complex map in minutes.

Motivation & concept

Visualization is a corner stone of both exploratory analysis and science communication. Bioinformatics workflows, unfortunately, tend to generate a plethora of data products often in adventurous formats making it quite difficult to integrate and co-visualize the results. Instead of trying to cater to the all these different formats explicitly, gggenomes embraces the simple tidyverse-inspired credo:

  • Any data set can be transformed into one (or a few) tidy data tables
  • Any data set in a tidy data table can be easily and elegantly visualized

As a result gggenomes helps bridge the gap between data generation, visual exploration, interpretation and communication, thereby accelerating biological research.

Under the hood gggenomes uses a light-weight track system to accommodate a mix of related data sets, essentially implementing ggplot2 with multiple tidy tables instead of just one. The data in the different tables are tied together through a global genome layout that is automatically computed from the input and defines the positions of genomic sequences (chromosome/contigs) and their associated features in the plot.

Inspiration

gggenomes draws inspiration from some brilliant packages, in particular:

Installation

gggenomes is available as stable release on CRAN (from v1.0.1). The latest developmental versions are available on github.

# Install from CRAN
install.packages("gggenomes") 

# optionally install ggtree to plot genomes next to trees
# https://bioconductor.org/packages/release/bioc/html/ggtree.html
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("ggtree")

# Install latest developmental version from github
devtools::install_github("thackl/gggenomes")

Copy Link

Version

Install

install.packages('gggenomes')

Monthly Downloads

618

Version

1.1.2

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Thomas Hackl

Last Published

November 14th, 2025

Functions in gggenomes (1.1.2)

emale_seqs

Sequence index of 6 EMALE genomes (endogenous virophages)
focus

Show features and regions of interest
ex

Get path to gggenomes example files
flip_strand

Flip strand
drop_link_layout

Drop a link layout
drop_seq_layout

Drop a seq layout
flip

Flip bins and sequences
emale_gc

Relative GC-content along 6 EMALE genomes
emale_tirs

Terminal inverted repeats of 6 EMALE genomes
emale_genes

Gene annotations if 6 EMALE genomes (endogenous virophages)
emale_prot_ava

All-versus-all alignments 6 EMALE proteomes
emale_ngaros

Integrated Ngaro retrotransposons of 6 EMALE genomes
geom_seq_label

Draw seq labels
geom_gene_label

Draw feat/link labels
geom_seq

draw seqs
geom_link

Draw links between genomes
geom_bin_label

Draw bin labels
geom_seq_break

Decorate truncated sequences
geom_variant

Draw place of mutation
geom_feat

Draw feats
geom_feat_text

Add text to genes, features, etc.
geom_gene

Draw gene models
has_vars

Check if variables exist in object
geom_coverage

Draw wiggle ribbons or lines
get_seqs

Get/set the seqs track
gggenomes

Plot genomes, features and synteny maps
in_range

Do numeric values fall into specified ranges?
layout

Re-layout a genome layout
if_reverse

Vectorised if_else based on strandedness
is_reverse

Check whether strand is reverse
introduce

Introduce non-existing columns
layout_genomes

Layout genomes
position_strand

Stack features
position_variant

Plot types of mutations with different offsets
read_context

Read files in different contexts
layout_seqs

Layout sequences
read_bed

Read a BED file
read_blast

Read BLAST tab-separated output
feats

Use tracks inside and outside geom_* calls
pick

Pick bins and seqs by name or position
read_alitv

Read AliTV .json file
qw

Create a vector from unquoted words.
read_tracks

Read files in various standard formats (FASTA, GFF3, GBK, BED, BLAST, ...) into track tables
scale_color_variant

Default colors and shapes for mutation types.
read_paf

Read a .paf file (minimap/minimap2).
read_seq_len

Read sequence index
reexports

Objects exported from other packages
require_vars

Require variables in an object
read_vcf

Read a VCF file
read_gbk

Read genbank files
scale_x_bp

X-scale for genomic data
read_gff3

Read features from GFF3 (and with some limitations GFF2/GTF) files
swap_if

Swap values of two columns based on a condition
swap_query

Swap query and subject in blast-like feature tables
strand_int

Convert strand to integer
write_gff3

Write a gff3 file from a tidy table
shift

Shift bins left/right
vars_track

Tidyselect track variables
set_class

Modify object class attributes
width

The width of a range
split_by

Split by key preserving order
strand_chr

Convert strand to character
strand_lgl

Convert strand to logical
theme_gggenomes_clean

gggenomes default theme
unnest_exons

Unnest exons
track_info

Basic info on tracks in a gggenomes object
track_ids

Named vector of track ids and types
add_seqs

Add seqs
as_links

Compute a layout for link data
check_strand

Check strand
as_sublinks

Compute a layout for links linking feats
as_subfeats

Compute a layout for subfeat data
as_seqs

Compute a layout for sequence data
as_feats

Compute a layout for feat data
GeomFeatText

Geom for feature text
align

Align genomes relative to target genes, feats, seqs, etc.
add_feats

Add different types of tracks
combine_strands

Combine strands
emale_ava

All-versus-all whole genome alignments of 6 EMALE genomes
def_names

Default column names and types for defined formats
emale_cogs

Clusters of orthologs of 6 EMALE proteomes
def_formats

Defined file formats and extensions
dim.gggenomes_layout

ggplot2::facet_null checks data with empty(df) using dim. This causes an error because dim(gggenome_layout) is undefined. Return dim of primary table instead
drop_layout

Drop a genome layout
drop_feat_layout

Drop feature layout