Learn R Programming

bioseq (version 0.1.4)

seq_cluster: Cluster sequences by similarity

Description

Cluster sequences by similarity

Usage

seq_cluster(x, threshold = 0.05, method = "complete")

Value

An integer vector with group memberships.

Arguments

x

a DNA, RNA or AA vector of sequences to clustered.

threshold

Threshold value (range in [0, 1]).

method

the clustering method (see details).

Details

The function uses ape dist.dna and dist.aa functions to compute pairwise distances among sequences and hclust for clustering.

Computing a full pairwise diastance matrix can be computationally expensive. It is recommended to use this function for moderate size dataset.

Supported methods are:

  • "single" (= Nearest Neighbour Clustering)

  • "complete" (= Farthest Neighbour Clustering)

  • "average" (= UPGMA)

  • "mcquitty" (= WPGMA)

See Also

Function seq_consensus to compute consensus and representative sequences for clusters.

Other aggregation operations: seq_consensus()

Examples

Run this code

x <- c("-----TACGCAGTAAAAGCTACTGATG",
       "CGTCATACGCAGTAAAAACTACTGATG",
       "CTTCATACGCAGTAAAAACTACTGATG",
       "CTTCATATGCAGTAAAAACTACTGATG",
       "CTTCATACGCAGTAAAAACTACTGATG",
       "CGTCATACGCAGTAAAAGCTACTGATG",
       "CTTCATATGCAGTAAAAGCTACTGACG")
x <- dna(x)
seq_cluster(x)

Run the code above in your browser using DataLab