
Finding coding genes in genomic DNA using the Prodigal software.
findGenes(
genome,
prodigal.exe = "prodigal",
faa.file = "",
ffn.file = "",
proc = "single",
trans.tab = 11,
mask.N = FALSE,
bypass.SD = FALSE
)
A GFF-table (see readGFF
for details) with one row for each detected
coding gene.
A table with columns Header and Sequence, containing the genome sequence(s).
Command to run the external software prodigal on the system (text).
If provided, prodigal will output all proteins to this fasta-file (text).
If provided, prodigal will output all DNA sequences to this fasta-file (text).
Either "single"
or "meta"
, see below.
Either 11 or 4 (see below).
Turn on masking of N's (logical)
Bypass Shine-Dalgarno filter (logical)
Lars Snipen and Kristian Hovde Liland.
The external software Prodigal is used to scan through a prokaryotic genome to detect the protein
coding genes. The text in prodigal.exe
must contain the exact command to invoke barrnap on the system.
In addition to the standard output from this function, FASTA files with protein and/or DNA sequences may
be produced directly by providing filenames in faa.file
and ffn.file
.
The input proc
allows you to specify if the input data should be treated as a single genome
(default) or as a metagenome. In the latter case the genome
are (un-binned) contigs.
The translation table is by default 11 (the standard code), but table 4 should be used for Mycoplasma etc.
The mask.N
will prevent genes having runs of N inside. The bypass.SD
turn off the search
for a Shine-Dalgarno motif.
readGFF
, gff2fasta
.
if (FALSE) {
# This example requires the external prodigal software
# Using a genome file in this package.
genome.file <- file.path(path.package("microseq"),"extdata","small.fna")
# Searching for coding sequences, this is Mycoplasma (trans.tab = 4)
genome <- readFasta(genome.file)
gff.tbl <- findGenes(genome, trans.tab = 4)
# Retrieving the sequences
cds.tbl <- gff2fasta(gff.tbl, genome)
# You may use the pipe operator
library(ggplot2)
readFasta(genome.file) %>%
findGenes(trans.tab = 4) %>%
filter(Score >= 50) %>%
ggplot() +
geom_histogram(aes(x = Score), bins = 25)
}
Run the code above in your browser using DataLab