Learn R Programming

bioseq (version 0.1.1)

seq_cluster: Cluster sequences by similarity

Description

Cluster sequences by similarity

Usage

seq_cluster(x, threshold = 0.05, method = "complete")

Arguments

x

a DNA, RNA or AA vector of sequences to clustered.

threshold

Threshold value (range in [0, 1]).

method

the clustering method (see details).

Value

An integer vector with group memberships.

Details

The function uses ape dist.dna and dist.aa functions to compute pairwise distances among sequences and hclust for clustering.

Computing a full pairwise diastance matrix can be computationally expensive. It is recommended to use this function for moderate size dataset.

Supported methods are:

  • "single" (= Nearest Neighbour Clustering)

  • "complete" (= Farthest Neighbour Clustering)

  • "average" (= UPGMA)

  • "mcquitty" (= WPGMA)

See Also

Function seq_consensus to compute consensus and representative sequences for clusters.

Other aggregation operations: seq_consensus

Examples

Run this code
# NOT RUN {
x <- c("-----TACGCAGTAAAAGCTACTGATG",
       "CGTCATACGCAGTAAAAACTACTGATG",
       "CTTCATACGCAGTAAAAACTACTGATG",
       "CTTCATATGCAGTAAAAACTACTGATG",
       "CTTCATACGCAGTAAAAACTACTGATG",
       "CGTCATACGCAGTAAAAGCTACTGATG",
       "CTTCATATGCAGTAAAAGCTACTGACG")
x <- dna(x)
seq_cluster(x)

# }

Run the code above in your browser using DataLab