Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

QM9EdgeDataset labels are wrong

See original GitHub issue

🐛 Bug

With QM9EdgeDataset, it seems that the prediction labels are broken. This may be because of the preprocessing, or because of a bad source for QM9.

@hengruizhang98 @mufeili

To Reproduce

from dgl.data import QM9EdgeDataset as DGLQM9Edge
from dgl.data import QM9Dataset as DGLQM9
import matplotlib.pyplot as plt

keys = ['mu', 'alpha', 'homo', 'lumo', 'gap', 'r2', 'zpve']
f, axs = plt.subplots(2, len(keys), figsize=(20, 5))
for i, task in enumerate(keys):
    ds_dgl = DGLQM9Edge([task])
    ds_dgl2 = DGLQM9([task])

    targets_dgl = ds_dgl.targets[:,i]
    targets_dgl2 =ds_dgl2.label[:,0]
    
    axs[0][i].hist(targets_dgl2, bins=50)
    axs[1][i].hist(targets_dgl, bins=50)

f.tight_layout()
plt.show()

The first row is the histograms of labels from QM9Dataset, and the second row is the ones from QM9EdgeDataset.

Expected behavior

Labels should be the same for all QM9 datasets.

Environment

DGL Version (e.g., 1.0): commit 1f4c0b7
Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3): PyTorch 1.9.0
OS (e.g., Linux): Linux
How you installed DGL (conda, pip, source): source
Build command you used (if compiling from source): cmake -DUSE_CUDA=ON -DUSE_FP16=ON … && make -j8
Python version: 3.8.8

Additional context

In the docs it says that the preprocessing is done here https://gist.github.com/hengruizhang98/a2da30213b2356fff18b25385c9d3cd2 so there must be something wrong there.

Issue Analytics

State:
Created 2 years ago
Comments:8 (8 by maintainers)

Top GitHub Comments

1reaction

hengruizhang98commented, Apr 2, 2021

Yes, It will take about 1 min to load graphs from QM9v2. While using QM9Edge is much faster as the graphs are constructed when called. You can choose the way you prefer.

0reactions

milesialcommented, Apr 2, 2021

I see that QM9V2 is directly loading DGL graphs with load_graphs, and QM9Edge is creating them on the fly. Maybe one is faster than the other.

Top Results From Across the Web

QM9Dataset — DGL 0.8.2post1 documentation - DGL Docs

This dataset consists of 130,831 molecules with 12 regression targets. Nodes correspond to atoms and edges correspond to close atom pairs. This dataset...

Source code for schnetpack.datasets.qm9 - Read the Docs

The QM9 database contains small organic molecules with up to nine non-hydrogen ... functions to download QM9 from figshare and load the data...

QM7 dataset - Quantum-Machine.org

The energy and force labels for each geometry are included in the comment line ... Identifiers used in this data set agree with...

Adaptive Pseudo-labeling for Quantum Calculations

The challenge in pseudo-labeling is to prevent the bad pseudo-labels from biasing ... on the QM9 dataset with labels produced by density function...

16. Predicting DFT Energies with GNNs

QM9 is a dataset of 134,000 molecules consisting of 9 heavy atoms drawn from the ... There are multiple labels (see table below),...