Learn R Programming

clst (version 1.20.0)

maxDists: Select a maximally diverse set of items given a distance matrix.

Description

Given a square matrix of pairwise distances, return indices of N objects with a maximal sum of pairwise distances.

Usage

maxDists(mat, idx = NA, N = 1,
         exclude = rep(FALSE, nrow(mat)),
         include.center = TRUE)

Arguments

mat
square distance matrix
idx
starting indices; if missing, starts with the object with the maximum median distance to all other objects.
N
total number of selections; length of idx is subtracted.
exclude
boolean vector indicating elements to exclude from the calculation.
include.center
includes the "most central" element (ie, the one with the smallest median of pairwise distances to all other elements) if TRUE

Value

  • A vector of indices corresponding to the margin of mat.

See Also

findOutliers

Examples

Run this code
library(ape)
library(clstutils)
data(seqs)
data(seqdat)
efaecium <- seqdat$tax_name == 'Enterococcus faecium'
seqdat <- subset(seqdat, efaecium)
seqs <- seqs[efaecium,]
dmat <- ape::dist.dna(seqs, pairwise.deletion=TRUE, as.matrix=TRUE, model='raw')

## find a maximally diverse set without first identifying outliers
picked <- maxDists(dmat, N=10)
picked
prettyTree(nj(dmat), groups=ifelse(1:nrow(dmat) %in% picked,'picked','not picked'))

## restrict selected elements to non-outliers
outliers <- findOutliers(dmat, cutoff=0.015)
picked <- maxDists(dmat, N=10, exclude=outliers)
picked
prettyTree(nj(dmat), groups=ifelse(1:nrow(dmat) %in% picked,'picked','not picked'),
X = outliers)

Run the code above in your browser using DataLab