Tetko et al. (2001) and Huuskonen (2000) investigated a set of
compounds with corresponding experimental solubility values using complex
sets of descriptors. They used linear regression and neural network models
to estimate the relationship between chemical structure and solubility. For
our analyses, we will use 1267 compounds and a set of more understandable
descriptors that fall into one of three groups: 208 binary "fingerprints"
that indicate the presence or absence of a particular chemical
sub-structure, 16 count descriptors (such as the number of bonds or the
number of Bromine atoms) and 4 continuous descriptors (such as molecular
weight or surface area).