Learn R Programming

geneviewer (version 0.1.10)

protein_blast: Perform Protein BLAST Analysis Within Specified Clusters

Description

This function conducts a BLAST analysis for protein sequences within specified clusters. It generates all possible protein combinations between a query cluster and other clusters, performs pairwise alignments, calculates sequence identity and similarity, and filters results based on a minimum identity threshold.

Usage

protein_blast(
  data,
  query,
  id = "protein_id",
  start = "start",
  end = "end",
  cluster = "cluster",
  genes = NULL,
  identity = 30,
  parallel = TRUE
)

Value

A modified version of the input `data` dataframe, including additional columns for BLAST results (identity, similarity).

Arguments

data

A dataframe or a character vector specifying the path to .gbk files. When a character vector is provided, it is interpreted as file paths to .gbk files which are then read and processed. The dataframe must contain columns for unique protein identifiers, cluster identifiers, protein sequences, and the start and end positions of each gene.

query

The name of the query cluster to be used for BLAST comparisons.

id

The name of the column that contains the gene identifiers. Defaults to "protein_id".

start

The name of the column specifying the start positions of genes. Defaults to "start".

end

The name of the column specifying the end positions of genes. Defaults to "end".

cluster

The name of the column specifying the cluster names. Defaults to "cluster".

genes

An optional vector of gene identifiers to include in the analysis. Defaults to NULL.

identity

Minimum identity threshold for BLAST hits to be considered significant. Defaults to 30.

parallel

Logical indicating whether to use parallel processing for alignments. Defaults to TRUE.

Examples

Run this code
if (FALSE) {
path_to_folder <- "path/to/gbk/folder/"
data_updated <- protein_blast(
                         path_to_folder,
                         id = "protein_id",
                         query = "cluster A",
                         identity = 30
                         )
}

Run the code above in your browser using DataLab