Property prediction
See original GitHub issueThis codebase works well for me and I’m able to replicate the current results. Having worked a good bit with other latent spaces, I’m curious to find out what other operations the latent space of this model might support. Specifically, I suspect that the latent space could also support analogies and attribute vectors, but unfortunately I’m not familiar with chemistry datasets and smile strings.
Would anyone be interested in helping me build a labelled dataset of molecules that includes binary attributes and then investigating the results of applying attribute vectors? An example structure of the dataset would be:
smile string | Polar | Toxic | Flammable | Positive Oxidation State |
---|---|---|---|---|
CN1CCC[C@H]1c2cccnc2 | True | False | False | True |
O=C1Oc2ccccc2c3ccccc13 | False | True | True | False |
… |
Generally, these datasets can be useful even if they are much smaller than the training dataset - say dozens to hundreds of rows. Ideally, the chosen attributes would be those that could serve as unambiguous labels and operators. For example, pretend the following is true:
Carbon dioxide is a polar molecule.
The equivalent to carbon dioxide without polarity is carbon monoxide.
Then this would be a great attribute because it follows the formula:
Molecule X has (doesn't have) attribute Y
The equivalent of Molecule X with (without) attribute Y is Z
I don’t know enough chemistry to know if there are even such attributes for subsets of molecules. But if there are, then a small dataset of molecules with and without attribute Y would be sufficient to see if Z could be inferred from this model given X.
Issue Analytics
- State:
- Created 7 years ago
- Reactions:2
- Comments:15 (15 by maintainers)
Top GitHub Comments
I think one of the decisions to be made is what the ambition of this repo is: are we just reproducing the one paper? Do we want to expand upon it to some other functional aim?
Personally I’m for going further as there are lots of interesting ideas to explore here and not many venues like this where you can just check out the code and easily get something running. I also like the little proto-community we have here and I don’t know how well it would work to move everyone to a different repo.
I agree what you point out, and I think go further is better, but I think before this, the repetition of original model is needed. Currently, I think the GP part of original paper is what you didn’t achieve.