question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support for DGL Modeling

See original GitHub issue

Hi DeepChem team,

This is Mufei from the DGL team. I’ve also spent some time developing DGL-LifeSci – a DGL-based package for working with graphs in chemistry and biology. It seems that DeepChem has started supporting DGL-based modeling with PyTorch (e.g. #2089 by @nd-02110114 ), which is rather exciting! I’ve had some chats with @rbharath before about contributing to this effort. Below are some observations & proposals and I’d like to know your thoughts.

Compared with pure PyTorch-based modeling, DGL-based modeling additionally requires:

  • Converting graph data into DGL’s data structure DGLGraph and storing node/edge features in DGLGraph.ndata and DGLGraph.edata
  • Using APIs like DGLGraph.update_all for invoking message passing over graphs in NN modules

For the first point, DeepChem employs GraphData as an intermediate graph representation across different frameworks (DGL, PyTorch Geometric, etc.). It allows graph creation from a COO format along with pre-processed features and exposes a to_dgl_graph API for converting a GraphData instance into a DGLGraph instance. For the second point, DeepChem implements DGL-based PyTorch models under deepchem/deepchem/models/torch_models.

For a better support of DGL-based modeling and more generally, graph-based modeling, there are several possible points:

  • Support for APIs like from_dgl_graph and from_pyg_graph, which can be helpful for users familiar with DGL/PyG before.
  • A simple interface for custom dataset. This can be something like a variant of CSVLoader, which directly constructs a graph dataset from files of a standard data format like CSV. It allows users to specify the type of graph to construct as well as the way to featurize their nodes/edges. DGL-LifeSci’s MoleculeCSVDataset can be an example for this.
  • Functions for constructing standard types of graphs (molecular graphs, complete graphs, KNN graphs, distance-based graphs) from raw data like SMILES strings. While graph creation from a COO format maximizes the flexibility, it can be convenient to have such functions for users who only want to try existing modeling approaches on their own datasets. I think the design of Protein Graph Library follows a similar idea.
  • Support for heterogeneous graphs, i.e. graphs of typed nodes and edges. I have the impression that DeepChem is mainly for molecular property prediction and a bit protein-ligand binding affinity prediction, so maybe this is less an issue. However, this can still be helpful even for molecular property prediction when one wants to combine information from different graph structures. Molecule Attention Transformer is an example for this.
  • Examples for model training and evaluation on MoleculeNet
  • Additionally, DGL-LifeSci has implemented some graph neural networks here and I’d like to know if you are open to directly import them in DeepChem.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:2
  • Comments:12 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
rbharathcommented, Sep 1, 2020

This is Mufei from the DGL team. I’ve also spent some time developing DGL-LifeSci – a DGL-based package for working with graphs in chemistry and biology. It seems that DeepChem has started supporting DGL-based modeling with PyTorch (e.g. #2089 by @nd-02110114 ), which is rather exciting! I’ve had some chats with @rbharath before about contributing to this effort. Below are some observations & proposals and I’d like to know your thoughts.

Great to see you on here @mufeili! I’m excited to see us improve DGL support/integration moving forward 😃

For a better support of DGL-based modeling and more generally, graph-based modeling, there are several possible points:

  • Support for APIs like from_dgl_graph and from_pyg_graph, which can be helpful for users familiar with DGL/PyG before.

+1 to this! @nd-02110114 has kindly taken the lead on our refactoring to use GraphData as a common substrate. Once that’s merged in, we should be able to support more of DGL’s APIs.

  • A simple interface for custom dataset. This can be something like a variant of CSVLoader, which directly constructs a graph dataset from files of a standard data format like CSV. It allows users to specify the type of graph to construct as well as the way to featurize their nodes/edges. DGL-LifeSci’s MoleculeCSVDataset can be an example for this.

I think our current dataloader/featurizer pipeline would support these use cases right? Taking a look at MoleculeCSVDataset, I think the analog would be our InMemoryLoader https://deepchem.readthedocs.io/en/latest/dataloaders.html#inmemoryloader which allows for loading of data from Pandas dataframes as well. Is there any useful functionality that we’re missing here?

  • Functions for constructing standard types of graphs (molecular graphs, complete graphs, KNN graphs, distance-based graphs) from raw data like SMILES strings. While graph creation from a COO format maximizes the flexibility, it can be convenient to have such functions for users who only want to try existing modeling approaches on their own datasets. I think the design of Protein Graph Library follows a similar idea.

+1 to this as well! Props to @nd-02110114 for taking the lead here as well 😃

  • Support for heterogeneous graphs, i.e. graphs of typed nodes and edges. I have the impression that DeepChem is mainly for molecular property prediction and a bit protein-ligand binding affinity prediction, so maybe this is less an issue. However, this can still be helpful even for molecular property prediction when one wants to combine information from different graph structures. Molecule Attention Transformer is an example for this.

We’ve definitely talked about getting Molecule Attention Transformer support in DeepChem! I’d love to see MAT and similar models supported 😃

For more heterogenous graphs, I’m not sure. DeepChem’s focus is on scientific deep learning applications. Is there a good scientific use case for heterogenous graphs beyond MAT? As @nd-02110114 noted above, this may be out of scope for us if there isn’t a clear use case.

  • Examples for model training and evaluation on MoleculeNet

This is definitely on our priorities! @nd-02110114 is actively working on this already, and I’d love to see more MoleculeNet benchmarks for all DGL models be put up.

  • Additionally, DGL-LifeSci has implemented some graph neural networks here and I’d like to know if you are open to directly import them in DeepChem.

We’d be very opening to directly importing DGL-LifeSci’s models in DeepChem! It might also be useful to write small wrapper classes wrapping DGL models in TorchModel for ease of benchmarking on MoleculeNet or interoperating with the rest of the DeepChem API. In general though, I’d love to see closer integration with DGL-LifeSci’s models. We’re both working on similar problems and better to join forces and bring more value to our community

1reaction
nissy-devcommented, Sep 1, 2020

Thanks for your comments! These comments are really helpful for us.

The following comments are my personal opinion.

Support for APIs like from_dgl_graph and from_pyg_graph, which can be helpful for users familiar with DGL/PyG before.

I agree. Before doing this, we need to refactor the present DeepChem’s graph models using GraphData class. I will work for this refactoring in this month. I seem the priority is high.

A simple interface for custom dataset. This can be something like a variant of CSVLoader, which directly constructs a graph dataset from files of a standard data format like CSV. It allows users to specify the type of graph to construct as well as the way to featurize their nodes/edges. DGL-LifeSci’s MoleculeCSVDataset can be an example for this.

Functions for constructing standard types of graphs (molecular graphs, complete graphs, KNN graphs, distance-based graphs) from raw data like SMILES strings. While graph creation from a COO format maximizes the flexibility, it can be convenient to have such functions for users who only want to try existing modeling approaches on their own datasets. I think the design of Protein Graph Library follows a similar idea.

I think these feature can be achieved by refactoring the GraphData and the featurizer which is implemented for molecules. This makes our graph model support more general, so I will definitely try to implement. (I like the design of Protein Graph Library, so I will imitate the API design) I seem the priority is intermediate.

Support for heterogeneous graphs, i.e. graphs of typed nodes and edges. I have the impression that DeepChem is mainly for molecular property prediction and a bit protein-ligand binding affinity prediction, so maybe this is less an issue. However, this can still be helpful even for molecular property prediction when one wants to combine information from different graph structures. Molecule Attention Transformer is an example for this.

To be honest, I also seem that this feature is basically out of scope. But, I’m interested in Molecule Attention Transformer. If we have a time, we will try to support. I seem the priority is low.

Examples for model training and evaluation on MoleculeNet

This is a working progress and highest priority task. Currently, I’m checking whether the model is working well with GPU or a large dataset. After finishing, I will add more details to deepchem docs.

Additionally, DGL-LifeSci has implemented some graph neural networks here and I’d like to know if you are open to directly import them in DeepChem.

I seem this is open. But, currently, it is impossible to use DGL-LifeSci models with no modification. How about @rbharath?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Install
Amazon SageMaker now supports DGL, simplifying implementation of DGL models. A Deep Learning container (MXNet 1.6 and PyTorch 1.3) bundles all the software ......
Read more >
DGL Serving Best Practice - Questions
Hi team, want to know your best practice about how DGL model deployment & serving. I only know standard model save like: th.save(**.pt)....
Read more >
DGL Docs - Deep Graph Library
Deep Graph Library (DGL) is a Python package built for easy implementation of graph neural network model family, on top of existing DL...
Read more >
Deep Graph Library
DGL empowers a variety of domain-specific projects including DGL-KE for learning large-scale knowledge graph embeddings, DGL-LifeSci for bioinformatics and ...
Read more >
Contribute to DGL — DGL 0.8.2post1 documentation
Since DGL supports multiple tensor frameworks, contributing a core feature is no easy job. However, we do NOT require knowledge of all tensor...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found