Extracts ORF identifiers, start/end positions and strand orientation directly from the FASTA headers produced by Prodigal. The resulting table is ready for downstream gene-cluster analyses.
orf_locate(in_seq_data = seq_data)A data frame
A data frame with two columns:
SeqNameORF identifier (Prodigal format: >ORF_id # start # end # strand # ...).
SequenceORF sequence.
Example:
"Kuafubacteriaceae--GCA_016703535.1---JADJBV010000001.1_1 # 74 # 1018 # 1 # ..."
Can be imported from Prodigal FASTA using:
seq_data <- Biostrings::readBStringSet("Prodigal.fasta",format="fasta", nrec=-1L, skip=0L, seek.first.rec=FALSE, use.names=TRUE) %>%
data.frame(Sequence = .) %>%
tibble::rownames_to_column("SeqName")