Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Property prediction

See original GitHub issue

This codebase works well for me and I’m able to replicate the current results. Having worked a good bit with other latent spaces, I’m curious to find out what other operations the latent space of this model might support. Specifically, I suspect that the latent space could also support analogies and attribute vectors, but unfortunately I’m not familiar with chemistry datasets and smile strings.

Would anyone be interested in helping me build a labelled dataset of molecules that includes binary attributes and then investigating the results of applying attribute vectors? An example structure of the dataset would be:

smile string	Polar	Toxic	Flammable	Positive Oxidation State
CN1CCC[C@H]1c2cccnc2	True	False	False	True
O=C1Oc2ccccc2c3ccccc13	False	True	True	False
…

Generally, these datasets can be useful even if they are much smaller than the training dataset - say dozens to hundreds of rows. Ideally, the chosen attributes would be those that could serve as unambiguous labels and operators. For example, pretend the following is true:

Carbon dioxide is a polar molecule.
The equivalent to carbon dioxide without polarity is carbon monoxide.

Then this would be a great attribute because it follows the formula:

Molecule X has (doesn't have) attribute Y
The equivalent of Molecule X with (without) attribute Y is Z

I don’t know enough chemistry to know if there are even such attributes for subsets of molecules. But if there are, then a small dataset of molecules with and without attribute Y would be sufficient to see if Z could be inferred from this model given X.

Issue Analytics

State:
Created 7 years ago
Reactions:2
Comments:15 (15 by maintainers)

Top GitHub Comments

3reactions

maxhodakcommented, Nov 13, 2016

I think one of the decisions to be made is what the ambition of this repo is: are we just reproducing the one paper? Do we want to expand upon it to some other functional aim?

Personally I’m for going further as there are lots of interesting ideas to explore here and not many venues like this where you can just check out the code and easily get something running. I also like the little proto-community we have here and I don’t know how well it would work to move everyone to a different repo.

0reactions

hsiaoyi0504commented, Nov 14, 2016

I agree what you point out, and I think go further is better, but I think before this, the repetition of original model is needed. Currently, I think the GP part of original paper is what you didn’t achieve.

Top Results From Across the Web

25+ Housing Market Predictions 2022-2026 [Crash Coming?]

Expert investor, Kathy Fettke, shares housing market predictions for the next five years (2022, 2023, 2024, 2025 & 2026) & reveals if the ......

Molecular Property Prediction | Papers With Code

70 papers with code • 17 benchmarks • 7 datasets. Molecular property prediction is the task of predicting the properties of a molecule...

Geometry-enhanced molecular representation learning for ...

Molecular property prediction has been widely considered as one of the most critical tasks in computational drug and materials discovery, ...

Analyzing Learned Molecular Representations for Property ...

The property prediction models most similar to our own are encapsulated in the Message Passing Neural Network (MPNN) framework presented in ...

A compact review of molecular property prediction with graph ...

The prediction of molecular properties is a fundamental task in the field of drug discovery. Computational methods for their accurate prediction can ...