Learn R Programming

VIProDesign (version 0.1.0)

phyl_tree_cluster_dbscan: Perform DBSCAN Clustering on a Phylogenetic Tree

Description

This function applies the DBSCAN clustering algorithm on a set of protein sequences to identify clusters and remove outliers based on a distance cutoff.

Usage

phyl_tree_cluster_dbscan(input_obj, cutoff, nmin)

Value

This function returns a `AAStringSet` object containing protein sequences with outliers removed.

Arguments

input_obj

A `AAStringSet` object containing protein sequences.

cutoff

A numeric value specifying the distance cutoff for clustering.

nmin

An integer specifying the minimum number of points required to form a cluster (DBSCAN parameter).

Details

The function uses the DBSCAN algorithm to cluster sequences based on their phylogenetic distances. Sequences identified as outliers are excluded from the final output.

Examples

Run this code
# Example usage:
library(Biostrings)

# Create an AAStringSet object with the sequences
seqs <- AAStringSet(c(
  seq1 = "MKTIIALSYIFCLVFADYKDDDDK",
  seq2 = "MKTIIALSYIFCLVFADYKDLLKDDDD",
  seq3 = "MKTIIALSYIFCLVFADEELYKDDDD",
  seq4 = "MKTIEIALSYIFCLVFADYKDDDD",
  seq5 = "MKTIIKLAAASYIFCLVFADYKDDDD",
  seq6 = "MKTIIALSKIPFCLVFADYKDDDD",
  seq7 = "MKTIIALSYIFiQEERTCLVFADYKDDDD"
))

# Perform DBSCAN clustering and remove outliers
no_outliers <- phyl_tree_cluster_dbscan(seqs, cutoff = 0.5, nmin = 5)

Run the code above in your browser using DataLab