question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

I propose that we add a new FEP (free energy perturbation) module, as dc.fep.

Introduction

Free energy perturbation has become an increasingly powerful technique in modern drug discovery. Starting with the publication of techniques like LOMAP and Schrodinger’s FEP+ technique, and the release of open source tools like Yank, free energy techniques have matured as powerful techniques for estimating the binding free energy of molecules to proteins. The basic idea of FEP is that is it’s possible to estimate the binding free energy of small changes to a system by using Zwanzig FEP identity (Github doesn’t support latex so this will be a little messy)

E_A[exp(-beta*Delta(U))] = exp(-beta Delta(F))

Here U is an energy function. We assume we have two states A and B. Think of A as the initial state, with energy function U_A and B as the ending state with energy function U_B. The difference Delta(U) = U_B - U_A The expectation E_A is over the distribution specified by the density

p_A = C * exp(-beta U_A)

In practice, it’s possible to perform a molecular dynamics simulation to compute this expectation. This allows for the estimation of the change in free energy Delta(F) = F_B - F_A from a simulation in A. For this simulation to converge reasonably though, B and A should overlap considerable (since we are sampling from A to estimate the density of B). This has traditionally been sampling. There are techniques like MBAR (see https://github.com/choderalab/pymbar, https://github.com/alchemistry/alchemlyb) that help perform these calculations, but convergence can still be slow.

DeepMind recently came out with a fascinating paper Targeted Free Energy Estimation Via Learned Mappings that proposes a technique to help with this problem. The idea is to use a normalizing flow that transforms state A to state A' that has higher overlap with B. A normalizing flow is a type of deep network that evolves probability distributions. The DeepMind paper trains a normalizing flow for a simple solute-solvent system. Training data is generated by a MD simulation and used to train the normalizing flow. Results on this simple system show that the use of then normalizing flow appears to considerably speed up the convergence of the system.

Proposed Changes

I propose we should add support for deep FEP models in DeepChem. Doing this would require the following steps:

  • Adding support for normalizing flows: Normalizing flows behave a little differently from our supervised/metalearning/RL models since they evolve distributions. This would require some new infrastructure. There are luckily a number of reference normalizing flow implementations out there already which have permissive licenses (https://github.com/tonyduan/normalizing-flows), so we could likely leverage this code to build out some infrastructure.
    • This code should probably go in dc.models although if models are different enough, we might need to make a dc.normflow submodule like we have for dc.rl and dc.metalearning.
  • We need to add additional metrics suitable for normalizing flow training and evaluation to dc.metrics.
  • We need to add new loss functions, the LBAR (Learned Bennett Acceptance Ratio) and LFEP (Learned Free Energy Perturbations) as defined in the DeepMind paper.
  • To make deep FEP models more broadly useful, we need to generate larger and more interesting training datasets for them. These datasets should eventually find their way into dc.molnet. These datasets likely have to be generated by MD simulation of many protein/ligand systems. This will likely be a large effort and ideally we should find a way to tap already existing databases. Folding@Home has been running a lot of free energy calculations so they might have a suitable dataset already.
  • To apply FEP on systems of practical interest, we need a number of additional tools. When applying FEP on lead optimization problems, usually a series of related compounds are constructed. I propose that we add utilities to do this automatically, using CReM (https://github.com/DrrDom/crem), a library to construct chemically reasonable mutations of a starting compound. The general pattern here is of a PerturbationGenerator abstract class that generates perturbed versions of an initial state. The CReM class would probably be a MoleculePerturbationGenerator concrete subclass. This should likely live in the dc.fep module.
  • When applying FEP on regions of interest, it’s often crucial to select the region for simulation correctly so unneeded work isn’t performed. We have some utilities for automatic BindingPocket detection in dc.docking which are similar, but we might need to add more targeted tools for selecting the region of interest.
  • We need a way of running FEP on new systems of interest. One way to do this might be by adding a new FEPEngine class. Underneath the hood, this class should rely on Yank and OpenMM to the degree possible. This would live in the dc.fep module.

Scope

Would these changes be in scope for DeepChem? One possibility to ask is wouldn’t this be a better contribution to Yank or a separate library? When eventually, as normalizing flow techniques mature, they will likely find their way into libraries like Yank that focus on free energy perturbation. But for the moment, normalizing flow techniques are very new. The DeepMind paper focuses on a very toy system. Considerable research and development will have to be done before these techniques are suitable for broader applications. This will involve a lot of model building, dataset gathering, benchmarking etc. As a scientific deep learning library, DeepChem is well suited to help facilitate these types of activities. I believe DeepMind has not open sourced their implementation, so creating a high quality reference implementation will help accelerate research in this field and get these techniques closer to practical applicability.

As a second question, it’s reasonable to ask whether this should be it’s own library instead of a part of DeepChem. The major advantage of building it within deepchem is that it’s easy to leverage and extend the work we’ve put into build/documentation/tooling around DeepChem. Bootstrapping a new library would be considerable work for a very experimental technique.

Another thing to note is that I’m in the middle of overhauling DeepChem’s support for structure based drug discovery at the moment. We have the new dc.docking module and I’m working on extending atomic convolutions. There should be natural synergy between these efforts and dc.fep that should be mutually beneficial.

Implementation

I’m willing to take the lead on implementing this feature, but since this is a large set of new features, any help other folks are interested in providing would be much appreciated. Also, everything I’ve laid out in this issue is just a first design sketch. Feedback and comments are very welcome!

CC @peastman @ncfrey who I think would be interested.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:6
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
ncfreycommented, Dec 21, 2020

I think the only thing we need would be a custom implementation of a circular spline bijector, which could either inherit from tfp.bijectors.RationalQuadraticSpline or use its code as a starting point and be modified to respect PBC.

0reactions
rbharathcommented, Dec 22, 2020

One thought here is that we might want to implement a dc.dock like module for programmatic FEP (calling say Yank or a new normalizing flow engine) under the hood. I don’t yet have a sense of how complicated this would be though

Read more comments on GitHub >

github_iconTop Results From Across the Web

FEP+ - Schrödinger
The FEP Mapper interface elucidates the network of transformations and facilitates the analysis of the consistency and convergence of the simulation results ...
Read more >
Free Energy Perturbation (FEP) - Cresset Group
Robust and efficient calculations. Seamlessly prepare proteins and ligands for the FEP experiment; Easily generate ligand poses with the ligand alignment ...
Read more >
Fitness Enhancement Program (FEP) - Navy Fitness
The FEP is a command-wide program to ... FEP sessions will be led by designated ... 5 weekly modules with quiz after each...
Read more >
Fundamental Engineering Principles (FEP) Program
FEP courses build a deeper conceptual understanding of transportation design, ... Certificates available for 5 modules*; Live online virtual classroom ...
Read more >
Free Energy Perturbation Hamiltonian Replica ... - PubMed
In particular, it has been shown previously that a FEP/REMD scheme ... to the REPDSTR module of the biomolecular simulation program CHARMM.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found