Learn R Programming

gclink (version 1.1)

orf_extract: Extract ORF and Genome Information from BLAST or BLASTP Results

Description

This function parses BLASTP result tables to extract structured genome, contig, ORF, and gene information from the query and subject identifiers. It is designed for downstream analyses requiring explicit separation of genome, contig, and ORF identifiers from concatenated BLAST headers.

Usage

orf_extract(bin_genes = blastp_df)

Value

The original data frame with six additional columns:

genome

Genome identifier extracted from qaccver.

contig

Contig identifier extracted from qaccver.

orf

Full ORF identifier extracted from qaccver.

genome_contig

Concatenated genome and contig IDs (genome---contig).

gene

Gene symbol extracted from saccver.

orf_position

Numeric ORF position extracted from the ORF identifier.

Arguments

bin_genes

A data frame containing BLASTP results with at least 2 standard columns: qaccver, saccver. the column of qaccver should include both of the genome name and predicted contig name, which is concatenated by a separator "---". for example, for the qaccver "p__Myxococcota--c__Kuafubacteria--o__Kuafubacteriales--f__Kuafubacteriaceae--GCA_016703535.1---JADJBV010000001.1_150", the genome name is "p__Myxococcota--c__Kuafubacteria--o__Kuafubacteriales--f__Kuafubacteriaceae--GCA_016703535.1", the contig name is "JADJBV010000001.1", the orf name is "JADJBV010000001.1_150", and the orf_position is "150". the column of saccver must include the gene name and may include the gene information, which are concatenated by a separator "_". for example, for the saccver "bchC_Methyloversatilis_sp_RAC08_BSY238_2447_METR", the gene name is "bchC", the gene information is Methyloversatilis_sp_RAC08_BSY238_2447_METR that can help you understand the source of gene.