question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

I think it’s time, for the sake of scaling this up to more systematic experiments, to restructure this project.

The primary challenge I realized when designing experiments and structuring the current project is that, there are too many parts in this streamline that we want to model. Specifically:

  • molecule object (rdkit.Molecule, openeye.GraphMol, or openforcefield.Molecule)

  • graph object (dgl.Graph or dgl.HeteroGraph) the object that contains the information regarding the identities / attributes of the atoms and how they are connected.

  • parametrized graph object (still dgl.Graph or dgl.HeteroGraph) the object that contains all the parameters necessary for MM-like energy evaluation, namely ks and eqs and so forth

  • energy (scalar tensor or a list of scalar tensor) the end object in both fitting and inference

There are problems with each of these objects, and we can argue that there are still downstream objects that we can have, namely those needed for openmm simulations.

But first, here’s how I picture to structure the repo:

  • espaloma/ core code for graph nets-powered force field fitting.
    • graph.py graph abstraction of molecules.
    • parametrized_graph.py molecular graph with all the parameters needed for energy evaluation
    • net.py interface for neural networks that operates on espaloma.graph
    • mm/ helper functions to calculate MM energy
  • scripts/ experiment scripts
    • typing/ experiment scripts to reproduce atom-typing
    • mm_fitting/ experiment scripts to fit espaloma potentials to Molecular Mechanics
    • qm_fitting/ experiment scripts to fit espaloma potentials to Quantum Mechanics -class_i/
      • class_ii/ with the inclusion of coupling and higher-order terms
    • deployment/ experiment to provide interface to openmm.system objects and run actual MD.
  • devtools development tools

The following functions are needed to allow the aforementioned streamline in a cleaner manner:

  • espaloma.graph.from_oemol and friends: read molecules into graphs
  • espaloma.parameterized_graph.from_forcefield(forcefield='gaff2') parametrize a mol graph using a legacy forcefield. we will likely need to port from other repos for these implementations, or import them.
  • `espaloma.parametrized_graph.from_nn()’ parametrize a mol from neural network models that could be trained.
  • espalmoa.mm.energy(g, x)evaluate energy from the parametrized molecule (however it’s parametrized) and its geometry ( lots of helper functions are needed, of course, and test coverage would ideally ensure it’s consistency with OpenMM although that might be tricky for especially nonbondned terms.

The following are the trickiest choices that I would like to have a discussion here:

  • ways to structure graph currently this is done by having both graph, heterograph and hierarchical_graph objects as input. I find this a bit ugly. I suggest allowing only one kind of graph as the input of either NNs or legacy force fields, and put whatever needed to express the relationships between graph entities as part of the model. Note, however, that we would need tricks to make sure that this part of the modeling is only executed exactly once during training.

  • ways to output energies without any sum functions, one molecule would have separated bond, angle, and nonbonded energies all with different dimensions. this becomes even more complicated when you have a batch with multiple molecules. I think it’s critical that we find a simple way to output energies

  • ways to have a general enough espaloma.net object to enable future expansion now on the representation level we already have a bunch of ideas: atom-level message-passing only, hierarchical message-passing, or somewhat fancier version of it that I proposed to use an RNN to encode the walks with different length (corresponding to different levels in the hierarchy). If we were to limit the input graph to be universal, how are we going to develop the net module so that it can be not too much of a headache to express these ideas.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
maxentilecommented, May 18, 2020

Rather than hard-code Class I and Class II, how about having a ParametrizedGraph object that can host both. And supply term in I and II as methods of the object. We can of course have presets as flags saying something like,

if self.terms == 'class-i': self.terms = [getattr(self, x) for x in ['bond', 'angle', 'non_bonded']

Both options are basically the same for a single binary comparison.

In one case, there’s a single class ParameterizedGraph, whose method definitions etc. each contain two branches, selected based on whether the object has a flag or not. In the other case there are two sub-classes of ParameterizedGraph, whose method definitions etc. contain one or the other branch.

I prefer to have two sub-classes, rather than having if-else branches that check a string flag to find out what kind of object the ParameterizedGraph really is. I think it doesn’t make too much difference if there are only two variants of ParameterizedGraph we’ll need to consider in the lifespan of the code.

However, if we’re entertaining the possibility of more than two variants of ParameterizedGraph (e.g. allowing different parameterization of each factor), I think it will become increasingly unwieldy to have if/else branches to tell which parts of which method definitions apply based on which flags are present.

In general, when we find ourselves branching on some string that says what class an object is, that’s often a hint that we’d be better off with sub-classes. (This page has a helpful comparison of these two options: https://refactoring.guru/replace-conditional-with-polymorphism )

1reaction
yuanqing-wangcommented, May 18, 2020

Thanks for fleshing this out! @maxentile I think it’s a lot clearer to further distinct the graphs into these stages.

I would suggest that we look more closely at potential_energy_fn. We might benefit from yet another intermediate layer named something like parametrized_graph. I think it’s not super straightforward how to jump directly from readout to a function object that takes in coordinates as input and outputs energy.

Moreover, we could compare parametrized_graph objects between methods. Also it would make it a lot easier to batch etc.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Restructure Definition & Meaning - Merriam-Webster
The meaning of RESTRUCTURE is to change the makeup, organization, or pattern of. How to use restructure in a sentence.
Read more >
RESTRUCTURE | definition in the Cambridge English Dictionary
to organize a company, business, or system in a new way to make it operate more effectively: The government restructured the coal industry ......
Read more >
Restructure Definition & Meaning
to change, alter, or restore the structure of: to restructure a broken nose. to effect a fundamental change in (as an organization or...
Read more >
Restructure definition and meaning | Collins English ...
To restructure an organization or system means to change the way it is organized, usually in order to make it work more effectively....
Read more >
Restructure - Definition, Meaning & Synonyms
When you restructure something, you organize it in a different way so it'll work better, like a practice schedule a coach restructures to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found