| maxDists {clstutils} | R Documentation |
Given a square matrix of pairwise distances, return indices of N objects with a maximal sum of pairwise distances.
maxDists(mat, idx = NA, N = 1,
exclude = rep(FALSE, nrow(mat)),
include.center = TRUE)
mat |
square distance matrix |
idx |
starting indices; if missing, starts with the object with the maximum median distance to all other objects. |
N |
total number of selections; length of idx is subtracted. |
exclude |
boolean vector indicating elements to exclude from the calculation. |
include.center |
includes the "most central" element (ie, the one with the smallest median of pairwise distances to all other elements) if TRUE |
A vector of indices corresponding to the margin of mat.
Note that it is important to evaluate if the candidate sequences contain outliers (for example, mislabeled sequences), because these will assuredly be included in a maximally diverse set of elements!
Noah Hoffman
library(ape) library(clstutils) data(seqs) data(seqdat) efaecium <- seqdat$tax_name == 'Enterococcus faecium' seqdat <- subset(seqdat, efaecium) seqs <- seqs[efaecium,] dmat <- ape::dist.dna(seqs, pairwise.deletion=TRUE, as.matrix=TRUE, model='raw') ## find a maximally diverse set without first identifying outliers picked <- maxDists(dmat, N=10) picked prettyTree(nj(dmat), groups=ifelse(1:nrow(dmat) %in% picked,'picked','not picked')) ## restrict selected elements to non-outliers outliers <- findOutliers(dmat, cutoff=0.015) picked <- maxDists(dmat, N=10, exclude=outliers) picked prettyTree(nj(dmat), groups=ifelse(1:nrow(dmat) %in% picked,'picked','not picked'), X = outliers)