Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to use MOSES train/test/testSF dataset in Torchdrug

See original GitHub issue

TorchDrug implements MOSES dataset, but doesn’t distinguish between train / test / testSF which MOSES has. To train GCPN on Moses, I think the correct order is to pretrain the model by train dataset at first, then train it on test / testSF dataset and finally generate the molecules. But how to do this in TorchDrug? There’s only one dataset named MOSES.

I have this question because when I generate molecules by MOSES, the statistics doesn’t look correct if compared to other models on MOSEC, especially the Scaf/Test property in the table, which tries to find out if there are same scaffolds in test dataset and generated molecules. It’s 0 for GCPN model after training on TorchDrug, following the tutorial. I think the problem is that TorchDrug only uses the train dataset but not test dataset. How can I explicitly use it? Thanks in advance!

MOSES MOSES2

Issue Analytics

State:
Created 2 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

KiddoZhucommented, Aug 28, 2021

Hi! There is a predefined split for MOSES implemented in TorchDrug. I am not sure if this is what you want. You can get it by

dataset = datasets.MOSES("/path/to/dataset")
train_set, valid_set, test_set = dataset.split()

Sorry I am not an expert in molecule generation. Maybe @shichence knows more about the dataset and evaluation setting on MOSES?

0reactions

KiddoZhucommented, Aug 29, 2021

I think you can just create another solver wrapping the original model with test_set, load the checkpoint and finetune the model on test_set.

Pretrain:

solver = core.Engine(task, train_set, None, None, optimizer, ...)
solver.train(num_epoch=10)
solver.save("gcpn_10epoch.pkl")

Finetune:

solver = core.Engine(task, test_set, None, None, optimizer, ...)
solver.load("gcpn_10epoch.pkl")
solver.train(num_epoch=1)

The same procedure can be applied to resume training.

That’s great! I will follow your code and check the dataset.

Top Results From Across the Web

Source code for torchdrug.datasets.moses

Source code for torchdrug.datasets.moses. import os from collections import defaultdict from torch.utils import data as torch_data from torchdrug import ...

Get Started | TorchDrug

For this tutorial, we use the ClinTox dataset. This dataset requires to predict whether a molecule is toxic in clinical trials, and whether...

torchdrug.datasets - TorchDrug 0.2.0 documentation

A filtered version of FB15k dataset without trivial cases. Statistics: ... class MOSES(path, verbose=1, transform=None, lazy=False, atom_feature='default', ...

torchdrug.data - TorchDrug 0.2.0 documentation

To batch graphs with variadic sizes, use data. ... To generate compact graph ids, use subbatch() . ... The whole dataset contains one...

Property Prediction - TorchDrug 0.2.0 documentation

Property prediction is aimed at predicting the chemical properties of a molecule based on its graph structure and features. Prepare the Dataset#. We...