Learn R Programming

gclink (version 1.1)

orf_locate: Parse ORF Coordinates from Prodigal FASTA Headers

Description

Extracts ORF identifiers, start/end positions and strand orientation directly from the FASTA headers produced by Prodigal. The resulting table is ready for downstream gene-cluster analyses.

Usage

orf_locate(in_seq_data = seq_data)

Value

A data frame

Arguments

in_seq_data

A data frame with two columns:

SeqName

ORF identifier (Prodigal format: >ORF_id # start # end # strand # ...).

Sequence

ORF sequence.

Example: "Kuafubacteriaceae--GCA_016703535.1---JADJBV010000001.1_1 # 74 # 1018 # 1 # ..." Can be imported from Prodigal FASTA using:

seq_data <- Biostrings::readBStringSet("Prodigal.fasta",format="fasta", nrec=-1L, skip=0L, seek.first.rec=FALSE, use.names=TRUE) %>%
  data.frame(Sequence = .) %>%
  tibble::rownames_to_column("SeqName")