Learn R Programming

prabclus (version 2.2-2)

prabclust: Clustering for biotic elements or for species delimitation (mixture method)

Description

Clusters a presence-absence matrix object (for clustering ranges/finding biotic elements, Hennig and Hausdorf, 2004) or an object of genetic information (for species delimitation, Hausdorf and Hennig, 2010) by calculating an MDS from the distances, and applying maximum likelihood Gaussian mixtures clustering with "noise" (package mclust) to the MDS points. The solution is plotted. A standard execution (using the default distance of prabinit) will be prabmatrix <- prabinit(file="path/prabmatrixfile", neighborhood="path/neighborhoodfile") clust <- prabclust(prabmatrix) print(clust) Examples for species delimitation are given below in the examples section. Note: Data formats are described on the prabinit and alleleinit help pages. You may also consider the example datasets kykladspecreg.dat, nb.dat, Heterotrigona_indoFO.txt or MartinezOrtega04AFLP.dat. Note: prabclust calls the function mclustBIC in package mclust. Its use is protected by a special license, see http://www.stat.washington.edu/mclust/license.txt, particularly point 6. An alternative is the use of hprabclust.

Usage

prabclust(prabobj, mdsmethod = "classical", mdsdim = 4, nnk =
ceiling(prabobj$n.species/40), nclus = 0:9, modelid = "all", permutations=0)

## S3 method for class 'prabclust': print(x, bic=FALSE, ...)

Arguments

prabobj
object of class prab as generated by prabinit. Presence-absence data to be analyzed. (This can be geographical information for range clustering Can also be an object of class alleleobject as generated b
mdsmethod
"classical", "kruskal", or "sammon". The MDS method to transform the distances to data points. "classical" indicates metric MDS by function cmdscale, "kruskal" is
mdsdim
integer. Dimension of the MDS points. For mdsmethod=="kruskal", stressvals can be used to see how the stress depends on mdsdim in order to choose mdsdim
nnk
integer. Number of nearest neighbors to determine the initial noise estimation by NNclean. nnk=0 fits the model without a noise component.
nclus
vector of integers. Numbers of clusters to perform the mixture estimation.
modelid
string. Model name for mclustBIC (see the corresponding help page; all models or combinations of models mentioned there are possible). modelid="all" compares all possible models. Additionally, "noVVV" is
permutations
integer. It has been found occasionally that depending on the order of observations the algorithms isoMDS and mclustBIC converge to different solutions. This is because these methods require an ordering of the distanc
x
object of class prabclust. Output of prabclust.
bic
logical. If TRUE, information about the BIC criterion to choose the model is displayed.
...
necessary for summary method.

Value

  • print.prabclust does not produce output. prabclust generates an object of class prabclust. This is a list with components
  • clusteringvector of integers indicating the cluster memberships of the species. Noise can be recognized by output component symbols.
  • clustsummaryoutput object of summary.mclustBIC. A list giving the optimal (according to BIC) parameters, conditional probabilities `z', and loglikelihood, together with the associated classification and its uncertainty. Note that the numbering of clusters may differ from clustering, see csreorder.
  • bicsummaryoutput object of mclustBIC. Bayesian Information Criterion for the specified mixture models and numbers of clusters.
  • pointsnumerical matrix. MDS configuration.
  • nnksee above.
  • mdsdimsee above.
  • mdsmethodsee above.
  • symbolsvector of characters, similar to clustering, but indicating estimated noise and points belonging to one-point-components (which should be interpreted as some kind of noise as well) by "N".
  • permchangelogical. If TRUE, permutations>0 has been used and the best solution is different from the one obtained by the standard ordering. (This is just for information and has no further operational consequences.)

Details

Note that if mdsmethod!="classical", zero distances between non-identical objects are replaced by the smallest nonzero distance divided by 10 to prevent the MDS methods from producing an error.

References

Fraley, C. and Raftery, A. E. (1998) How many clusters? Which clusterin method? - Answers via Model-Based Cluster Analysis. Computer Journal 41, 578-588.

Hausdorf, B. and Hennig, C. (2010) Species Delimitation Using Dominant and Codominant Multilocus Markers. Systematic Biology, published online http://sysbio.oxfordjournals.org/cgi/content/full/syq039?ijkey=LTDbmfF3LhbOOQY&keytype=ref Hennig, C. and Hausdorf, B. (2004) Distance-based parametric bootstrap tests for clustering of species ranges. Computational Statistics and Data Analysis 45, 875-896. http://stat.ethz.ch/Research-Reports/110.html.

See Also

mclustBIC, summary.mclustBIC, NNclean, cmdscale, isoMDS, sammon, prabinit, hprabclust, alleleinit, stressvals.

Examples

Run this code
# Biotic element/range clustering:
data(kykladspecreg)
data(nb)
set.seed(1234)
x <- prabinit(prabmatrix=kykladspecreg, neighborhood=nb)
# If you want to use your own ASCII data files, use
# x <- prabinit(file="path/prabmatrixfile",
# neighborhood="path/neighborhoodfile")
print(prabclust(x))

# Here is an example for species delimitation with codominant markers;
# only 50 individuals were used in order to have a fast example. 
data(tetragonula)
ta <- alleleconvert(strmatrix=tetragonula[1:50,])
tai <- alleleinit(allelematrix=ta)
print(prabclust(tai))

# Here is an example for species delimitation with dominant markers;
# only 50 individuals were used in order to have a fast example.
# You may want to use stressvals to choose mdsdim.
data(veronica)
vei <- prabinit(prabmatrix=veronica[1:50,],distance="jaccard")
print(prabclust(vei,mdsmethod="kruskal",mdsdim=3))

Run the code above in your browser using DataLab