
Finds coding genes in a genome using the Prodigal software.
prodigal(genome.file, prot.file = NULL, nuc.file = NULL,
closed.ends = TRUE, motif.scan = FALSE)
Name of a FASTA formatted file with all the DNA sequences for a genome (chromosomes, plasmids, contigs etc.).
If specified, amino acid sequence of each protein is written to this FASTA file.
If specified, nucleotide sequence of each protein is written to this FASTA file.
Logical, if TRUE
genes are not allowed to run off edges (default TRUE
).
Logical, if TRUE
forces motif scan instead of Shine-Dalgarno trainer (default
FALSE
).
A gff.table
with the metadata for all predicted genes (see readGFF
). If
prot.file
is specified, a FASTA formatted file with predicted protein sequences are also produced. If
nuc.file
is specified, a similar file with nucleotide sequences is also produced.
This function sets up a call to the software Prodigal (Hyatt et al, 2009). This software is designed to find coding genes in prokaryote genomes. It runs fast and has obtained very good results in tests among the automated gene finders. The options used as default here are believed to be the best for pan-genomic analyses.
Hyatt, D., Chen, G., LoCascio, P.F., Land, M.L., Larimer, F.W., Hauser, L.J. (2010). Prodigal: prokaryotic gene recognition and translation initiation site identification, BMC Bioinformatics, 11:119.
# NOT RUN {
# This example requires the external Prodigal software
# Using a genome file in this package
xpth <- file.path(path.package("micropan"),"extdata")
genome.file <- file.path(xpth,"Example_genome.fasta.xz")
# We need to uncompress it first...
tf <- tempfile(fileext=".xz")
s <- file.copy(genome.file,tf)
tf <- xzuncompress(tf)
# Calling Prodigal, and writing all predicted proteins to a file as well
prot.file <- tempfile(fileext=".fasta")
gff.table <- prodigal(tf,prot.file)
# Reading protein file as well
proteins <- readFasta(prot.file)
# ...and cleaning...
s <- file.remove(tf,prot.file)
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab