A data set that contains information about compounds used in drug discovery.
Specifically, this data set consists of 5631 compounds on which an in-house
solubility screen (ability of a compound to dissolve in a water/solvent mixture) was performed.
Based on this screen, compounds were categorized as either insoluble (n=3493) or soluble (n=2138).
Then, for each compound, 72 continuous, noisy structural
descriptors were computed.