findGenes: Finding coding genes

Description

Finding coding genes in genomic DNA using the Prodigal software.

Usage

findGenes(genome.file, faa.file = "", ffn.file = "", proc = "single",
  trans.tab = 11, mask.N = FALSE, bypass.SD = FALSE)

Arguments

genome.file

A fasta-formatted file with the genome sequence(s).

faa.file

If provided, prodigal will output all proteins to this fasta-file (text).

ffn.file

If provided, prodigal will output all DNA sequences to this fasta-file (text).

proc

Either "single" or "meta", see below.

trans.tab

Either 11 or 4 (see below).

mask.N

Turn on masking of N's (logical)

bypass.SD

Bypass Shine-Dalgarno filter (logical)

Value

A GFF-table (see readGFF for details) with one row for each detected coding gene.

Details

The external software Prodigal is used to scan through a prokaryotic genome to detect the protein coding genes. This free software can be installed from https://github.com/hyattpd/Prodigal.

In addition to the standard output from this function, fasta-files with protein and/or DNA sequences may be produced directly by providing filenames in faa.file and ffn.file.

The input proc allows you to specify if the input data should be treated as a single genome (default) or as a metagenome.

The translation table is by default 11 (the standard code), but table 4 should be used for Mycoplasma etc.

The mask.N will prevent genes having runs of N inside. The bypass.SD turn off the search for a Shine-Dalgarno motif.

Examples

Run this code

# NOT RUN {
# This example requires the external prodigal software
# Using a genome file in this package.
xpth <- file.path(path.package("microseq"),"extdata")
genome.file <- file.path(xpth,"small_genome.fasta")

# Searching for coding sequences, and inspecting
gff.tbl <- findGenes(genome.file)

# Retrieving the sequences
genome <- readFasta(genome.file)
cds <- gff2fasta(gff.tbl, genome)
# }
# NOT RUN {
# }