Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unable to reproduce benchmark results for (dtnn, qm9, random)

See original GitHub issue

Here is my system information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): CentOS release 6.7 (Final) x86_64
CUDA/cuDNN version: 7.5/5.1.3
GPU model and memory: Quadro K600 1GB DDR3

Using a clean checkout from Aug 16th, 614d8a3cc43a6f139a1a8cbff1c9eb581d92b46d, running

python examples/benchmark.py -s random -m dtnn -d qm9 -t
python examples/benchmark.py -s index -m dtnn -d qm9 -t

results.csv gives

qm9,random,regression,dtnn,mean-pearson_r2_score,train,0.58606980772779427,valid,0.53837581774384768,test,0.55762063055664446,time_for_running,14634.339544057846
qm9,index,regression,dtnn,mean-pearson_r2_score,train,0.6549500533371011,valid,0.2431349582030827,test,0.41939512301155074,time_for_running,13377.745332956314

The deepchem front page has

Dataset	Model	Splitting	Train score/R2	Valid score/R2
qm9	MT-NN regression	Index	0.733	0.766
	DTNN	Index	0.918	0.831
	MT-NN regression	Random	0.852	0.833
	DTNN	Random	0.942	0.948

If I’m reading this right, for the validation set, I’m getting e.g. a pearson R^2 score of 0.558 while I expected to get 0.948 for the random splitting

*(added results for index splitting)

Issue Analytics

State:
Created 6 years ago
Comments:14 (8 by maintainers)

Top GitHub Comments

1reaction

momearacommented, Aug 18, 2017

Good news, I upgraded rdkit to version 2017.03.3 and I am now getting significantly better results:

qm9,random,regression,dtnn,mean-pearson_r2_score,train,0.90034446761651732,valid,0.79499177162278145,test,0.82222053202103251,time_for_running,13429.928102970123

I will submit a pull request to update the minimum rdkit requirements.

0reactions

Dgelemicommented, Aug 18, 2017

This is what I got - with featurizer = deepchem.feat.CoulombMatrix(26)

qm9 | random | regression | dtnn | mean-pearson_r2_score | train | 0.929191 | valid | 0.82281 | test | 0.819825 | time_for_running | 11448.62

Top Results From Across the Web

MoleculeNet: A Benchmark for Molecular Machine Learning

MoleculeNet provide a series of benchmark results of imple- mented machine learning algorithms using various featurizations and splits upon ...

MoleculeNet: A Benchmark for Molecular Machine Learning

Users can reproduce benchmarks locally by following directions from DeepChem. Hyperparameters were determined using Gaussian Process Optimization via ...

Dataset's chemical diversity limits the generalizability of ...

It achieved exciting performances on QM9 benchmark where 11 out of 13 properties were predicted within chemical accuracy (1 kcal/mol on total ...

Machine Learning Force Fields | Chemical Reviews

However, too much regularization may lead to underfitting (blue line), that is, the model becomes unable to reproduce the training data at ...

Unable to reproduce results with PyTorch - Stack Overflow

I can't reproduce my results each times. I try to set random seed and only use the one GPU. My results are different...