question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to use MOSES train/test/testSF dataset in Torchdrug

See original GitHub issue

TorchDrug implements MOSES dataset, but doesn’t distinguish between train / test / testSF which MOSES has. To train GCPN on Moses, I think the correct order is to pretrain the model by train dataset at first, then train it on test / testSF dataset and finally generate the molecules. But how to do this in TorchDrug? There’s only one dataset named MOSES.

I have this question because when I generate molecules by MOSES, the statistics doesn’t look correct if compared to other models on MOSEC, especially the Scaf/Test property in the table, which tries to find out if there are same scaffolds in test dataset and generated molecules. It’s 0 for GCPN model after training on TorchDrug, following the tutorial. I think the problem is that TorchDrug only uses the train dataset but not test dataset. How can I explicitly use it? Thanks in advance!

MOSES MOSES2

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
KiddoZhucommented, Aug 28, 2021

Hi! There is a predefined split for MOSES implemented in TorchDrug. I am not sure if this is what you want. You can get it by

dataset = datasets.MOSES("/path/to/dataset")
train_set, valid_set, test_set = dataset.split()

Sorry I am not an expert in molecule generation. Maybe @shichence knows more about the dataset and evaluation setting on MOSES?

0reactions
KiddoZhucommented, Aug 29, 2021
  1. I think you can just create another solver wrapping the original model with test_set, load the checkpoint and finetune the model on test_set.

Pretrain:

solver = core.Engine(task, train_set, None, None, optimizer, ...)
solver.train(num_epoch=10)
solver.save("gcpn_10epoch.pkl")

Finetune:

solver = core.Engine(task, test_set, None, None, optimizer, ...)
solver.load("gcpn_10epoch.pkl")
solver.train(num_epoch=1)

The same procedure can be applied to resume training.

  1. That’s great! I will follow your code and check the dataset.
Read more comments on GitHub >

github_iconTop Results From Across the Web

Source code for torchdrug.datasets.moses
Source code for torchdrug.datasets.moses. import os from collections import defaultdict from torch.utils import data as torch_data from torchdrug import ...
Read more >
Get Started | TorchDrug
For this tutorial, we use the ClinTox dataset. This dataset requires to predict whether a molecule is toxic in clinical trials, and whether...
Read more >
torchdrug.datasets - TorchDrug 0.2.0 documentation
A filtered version of FB15k dataset without trivial cases. Statistics: ... class MOSES(path, verbose=1, transform=None, lazy=False, atom_feature='default', ...
Read more >
torchdrug.data - TorchDrug 0.2.0 documentation
To batch graphs with variadic sizes, use data. ... To generate compact graph ids, use subbatch() . ... The whole dataset contains one...
Read more >
Property Prediction - TorchDrug 0.2.0 documentation
Property prediction is aimed at predicting the chemical properties of a molecule based on its graph structure and features. Prepare the Dataset#. We...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found