question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Trying to train on gen2 dataset

See original GitHub issue

Hi,

I am trying to overfit espaloma to a small batch from gen2 dataset. I noticed that the reference energy u_ref is in large negative scale:

ds = esp.data.dataset.GraphDataset.load("gen2")
ds = ds[:10]
ds.shuffle(seed=2666)
ds_tr, ds_vl, ds_te = ds.split([5, 3, 2])

ds_tr_loader = ds_tr.view(batch_size=1, shuffle=True)
ds_vl_loader = ds_vl.view(batch_size=1, shuffle=True)

g_tr = next(iter(ds_tr.view(batch_size=1)))

g_tr = next(iter(ds_tr.view(batch_size=1)))
torch.mean(g_tr.nodes["g"].data['u_ref'], dim=1)
tensor([-1988.4373])

(side note, when I try to increase the batch size I get the following error)

Expect all graphs to have the same schema on nodes["g"].data, but graph 1 got
	{'u_openff-1.2.0': Scheme(shape=(29,), dtype=torch.float32), 'u_gaff-2.11': Scheme(shape=(29,), dtype=torch.float32), 'u_qm': Scheme(shape=(29,), dtype=torch.float32), 'u_ref': Scheme(shape=(29,), dtype=torch.float32), 'u_gaff-1.81': Scheme(shape=(29,), dtype=torch.float32)}
which is different from
	{'u_openff-1.2.0': Scheme(shape=(77,), dtype=torch.float32), 'u_gaff-2.11': Scheme(shape=(77,), dtype=torch.float32), 'u_qm': Scheme(shape=(77,), dtype=torch.float32), 'u_ref': Scheme(shape=(77,), dtype=torch.float32), 'u_gaff-1.81': Scheme(shape=(77,), dtype=torch.float32)}.

The model that I am using is initialized the following way:

representation = esp.nn.Sequential(
    layer=esp.nn.layers.dgl_legacy.gn("SAGEConv"), # use SAGEConv implementation in DGL
    config=[128, "relu", 128, "relu", 128, "relu"], # 3 layers, 128 units, ReLU activation
)

readout = esp.nn.readout.janossy.JanossyPooling(
    in_features=128, config=[128, "relu", 128, "relu", 128, "relu"],
    out_features={              # define modular MM parameters Espaloma will assign
        1: {"e": 1, "s": 1}, # atom hardness and electronegativity
        2: {"log_coefficients": 2}, # bond linear combination, enforce positive
        3: {"log_coefficients": 2}, # angle linear combination, enforce positive
        4: {"k": 6}, # torsion barrier heights (can be positive or negative)
    },
)

espaloma_model = torch.nn.Sequential(
                 representation, readout, 
                 esp.nn.readout.janossy.ExpCoefficients(),
                 esp.mm.geometry.GeometryInGraph(),
                 esp.mm.energy.EnergyInGraph(),
                 #esp.mm.energy.EnergyInGraph(suffix="_ref"),
                 esp.nn.readout.charge_equilibrium.ChargeEquilibrium(),
)

Now I am trying to overfit and I am training the following model:

normalize = esp.data.normalize.ESOL100LogNormalNormalize()
for idx_epoch in tqdm(range(2000)):
    intr_loss = 0
    k = 0
    for g in ds_tr_loader:
        optimizer.zero_grad()
        if torch.cuda.is_available():
            g = g.to("cuda:0")
        g = espaloma_model(g)
        g = normalize.unnorm(g)
            
        loss = loss_fn(g)
        loss.backward()
        optimizer.step()
        intr_loss += loss.item()

I am using the following loss function:

loss_fn = esp.metrics.GraphMetric(
        base_metric=torch.nn.MSELoss(),
        between=['u', "u_ref"],
        level="g",)

After training the train loss plot looks the following (epochs on the x-axis):

newplot (28)

The loss gets stuck at ~1.4M when you would expect it to be close to 0 (since I am training only on 5 examples). The energy for individual examples converges at some small positive value:

g_tr = espaloma_model(g_tr)
g_tr.nodes["g"].data["u"]
1.2619

If I do the same but on pepconf dataset (peptides) and get similar results. The output of espaloma is on a different scale.

My question is, what am I doing wrong? Is it the model architecture? The normalizer? Or smth else? Would appreciate any help.

Thanks!

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
yuanqing-wangcommented, May 21, 2022

The updated Colab notebook should work! http://data.wangyq.net/esp_notesbooks/qm_fitting.ipynb

Essentially the energies need to be centered before calculating the error.

0reactions
jchoderacommented, Jun 20, 2022

Reopening this issue to make sure we address the most recent comment!

  1. @yuanqing-wang : Can you be sure to document the units for the saved gen2 data?
  2. @yuanqing-wang : Would be good to document the units here too, as well as which part of the potential function is represented here—is it just the valence and coulomb components?
Read more comments on GitHub >

github_iconTop Results From Across the Web

Train models with Azure Machine Learning datasets
Learn how to make your data available to your local or remote compute for model training with Azure Machine Learning datasets.
Read more >
All Aboard, the Power BI Premium Gen2 Train! - Iteration Insights
Power BI Premium Gen 2 is set to kick off January 15, 2022. ... Visit our training pages to see a full list...
Read more >
Creating a Machine Learning Model Using ADLS Gen2 - Dremio
Azure Data Lake Storage Gen2 uses the file system for analytics, but it manages to support ... Then, we split the dataset into...
Read more >
Ways to access data in ADLS Gen2 | James Serra's Blog
If you get an “Access to the resource is forbidden” error when trying to read the data in Power BI, go to the...
Read more >
Custom training - DepthAI documentation - Luxonis
Nonetheless, the object detector does quite a good job with this relatively small data set for such a task. Again, training takes around...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found