How to use MOSES train/test/testSF dataset in TorchdrugSee original GitHub issue
MOSES dataset, but doesn’t distinguish between
testSF which MOSES has. To train GCPN on Moses, I think the correct order is to pretrain the model by
train dataset at first, then train it on
testSF dataset and finally generate the molecules. But how to do this in
TorchDrug? There’s only one dataset named
I have this question because when I generate molecules by MOSES, the statistics doesn’t look correct if compared to other models on MOSEC, especially the
Scaf/Test property in the table, which tries to find out if there are same scaffolds in test dataset and generated molecules. It’s 0 for GCPN model after training on
TorchDrug, following the tutorial. I think the problem is that
TorchDrug only uses the
train dataset but not
test dataset. How can I explicitly use it? Thanks in advance!
- Created 2 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
Hi! There is a predefined split for MOSES implemented in TorchDrug. I am not sure if this is what you want. You can get it by
dataset = datasets.MOSES("/path/to/dataset") train_set, valid_set, test_set = dataset.split()
Sorry I am not an expert in molecule generation. Maybe @shichence knows more about the dataset and evaluation setting on MOSES?
- I think you can just create another solver wrapping the original model with
test_set, load the checkpoint and finetune the model on
solver = core.Engine(task, train_set, None, None, optimizer, ...) solver.train(num_epoch=10) solver.save("gcpn_10epoch.pkl")
solver = core.Engine(task, test_set, None, None, optimizer, ...) solver.load("gcpn_10epoch.pkl") solver.train(num_epoch=1)
The same procedure can be applied to resume training.
- That’s great! I will follow your code and check the dataset.