Cluster the conserved waters.
ClusterWaters(data, cutoff.cluster, cluster.method = "complete")
The water oxygens' X, Y, and Z coordinates, B-values, and occupancy values.
Numerical value provided by the user for the distance between water oxygen atoms to form a cluster; default: 2.4 Angstroms.
Method of clustering the waters; default is
"complete
". Any other method accepted by the hclust
function is
appropriate. The original method used by Sanschagrin and Kuhn is the
complete linkage clustering method and is the default. Other options
include "ward.D
" (equivilant to the only Ward option in R
versions
3.0.3 and earlier), "ward.D2
" (implements Ward's 1963 criteria; see
Murtagh and Legendre 2014), "single
" (related to the minimal spanning
tree method and adopts a "friend of friends" clustering method), along with
"average
" (= UPGMA), "mcquitty
" (= WPGMA), "median
" (= WPGMC) or
"centroid
" (= UPGMC). Due to size limitations with stats::hclust()
--
specifically the "size cannot be NA nor exceed 65536
" --
fastcluster::hclust()
is being used because it is a complete replacement
of stats::hclust()
, is fast (compared to stats::hclust()
), and is able
to accommodate dissimilarity matrices with more than 2^16 (65,536)
observations.
This function returns:
h2o.clusters.raw: Initial waters with assigned cluster ID
h2o.clusters.summary: Each cluster's:
cluster ID
number of waters
percent conservation
X, Y, and Z cooridinates
bound water environment measurements
mean distance between waters comprising the cluster
mean distance between waters comprising the cluster and the cluster's centroid
h2o.occurrence: A table indicating the structures (PDBs) contributing to each cluster. This summary table includes the PDB structure's:
resolution
R-free value
occupancy (mean and standard deviation)
mobility (mean and standard deviation)
B-value (mean and standard deviation)
number of waters in each cluster
number of waters passing the mobility cutoff
number of waters passing the normalized B-value
number of waters passing both cutoff values
percentage of waters passing both cutoffs
number of clusters the structure contributes to
True/False table indicating if the protein structure contributed to the water cluster
clustering.info: size and timing information
Calculate the conserved waters using a collection of crystallographic protein structures.
Paul C Sanschagrin and Leslie A Kuhn. Cluster analysis of consensus water sites in thrombin and trypsin shows conservation between serine proteases and contributions to ligand specificity. Protein Science, 1998, 7 (10), pp 2054-2064. DOI: 10.1002/pro.5560071002 PMID: 9792092 WatCH webpage
Fionn Murtagh and Pierre Legendre. Ward's Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward's Criterion? Journal of Classification, 2014, 31, (3), pp 274-295. DOI: 10.1007/s00357-014-9161-z
Daniel M<U+00FC>llner. fastcluster: Fast Hierarchical, Agglomerative Clustering Routines for R and Python. Journal of Statistical Software, 2013, 53 (9) DOI: 10.18637/jss.v053.i09 fastcluster webpage