prodigal: Gene predictions using Prodigal

Description

Finds coding genes in a genome using the Prodigal software.

Usage

prodigal(genome.file, prot.file = NULL, nuc.file = NULL,
  closed.ends = TRUE, motif.scan = FALSE)

Arguments

genome.file

Name of a FASTA formatted file with all the DNA sequences for a genome (chromosomes, plasmids, contigs etc.).

prot.file

If specified, amino acid sequence of each protein is written to this FASTA file.

nuc.file

If specified, nucleotide sequence of each protein is written to this FASTA file.

closed.ends

Logical, if TRUE genes are not allowed to run off edges (default TRUE).

motif.scan

Logical, if TRUE forces motif scan instead of Shine-Dalgarno trainer (default FALSE).

Value

A gff.table with the metadata for all predicted genes (see readGFF). If prot.file is specified, a FASTA formatted file with predicted protein sequences are also produced. If nuc.file is specified, a similar file with nucleotide sequences is also produced.

Details

This function sets up a call to the software Prodigal (Hyatt et al, 2009). This software is designed to find coding genes in prokaryote genomes. It runs fast and has obtained very good results in tests among the automated gene finders. The options used as default here are believed to be the best for pan-genomic analyses.

References

Hyatt, D., Chen, G., LoCascio, P.F., Land, M.L., Larimer, F.W., Hauser, L.J. (2010). Prodigal: prokaryotic gene recognition and translation initiation site identification, BMC Bioinformatics, 11:119.

Examples

Run this code

# NOT RUN {
# This example requires the external Prodigal software
# Using a genome file in this package
xpth <- file.path(path.package("micropan"),"extdata")
genome.file <- file.path(xpth,"Example_genome.fasta.xz")

# We need to uncompress it first...
tf <- tempfile(fileext=".xz")
s <- file.copy(genome.file,tf)
tf <- xzuncompress(tf)

# Calling Prodigal, and writing all predicted proteins to a file as well
prot.file <- tempfile(fileext=".fasta")
gff.table <- prodigal(tf,prot.file)

# Reading protein file as well
proteins <- readFasta(prot.file)

# ...and cleaning...
s <- file.remove(tf,prot.file)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab

State of Data and AI Literacy Report 2025