Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Trying to train on gen2 dataset

See original GitHub issue

Hi,

I am trying to overfit espaloma to a small batch from gen2 dataset. I noticed that the reference energy u_ref is in large negative scale:

ds = esp.data.dataset.GraphDataset.load("gen2")
ds = ds[:10]
ds.shuffle(seed=2666)
ds_tr, ds_vl, ds_te = ds.split([5, 3, 2])

ds_tr_loader = ds_tr.view(batch_size=1, shuffle=True)
ds_vl_loader = ds_vl.view(batch_size=1, shuffle=True)

g_tr = next(iter(ds_tr.view(batch_size=1)))

g_tr = next(iter(ds_tr.view(batch_size=1)))
torch.mean(g_tr.nodes["g"].data['u_ref'], dim=1)
tensor([-1988.4373])

(side note, when I try to increase the batch size I get the following error)

Expect all graphs to have the same schema on nodes["g"].data, but graph 1 got
	{'u_openff-1.2.0': Scheme(shape=(29,), dtype=torch.float32), 'u_gaff-2.11': Scheme(shape=(29,), dtype=torch.float32), 'u_qm': Scheme(shape=(29,), dtype=torch.float32), 'u_ref': Scheme(shape=(29,), dtype=torch.float32), 'u_gaff-1.81': Scheme(shape=(29,), dtype=torch.float32)}
which is different from
	{'u_openff-1.2.0': Scheme(shape=(77,), dtype=torch.float32), 'u_gaff-2.11': Scheme(shape=(77,), dtype=torch.float32), 'u_qm': Scheme(shape=(77,), dtype=torch.float32), 'u_ref': Scheme(shape=(77,), dtype=torch.float32), 'u_gaff-1.81': Scheme(shape=(77,), dtype=torch.float32)}.

The model that I am using is initialized the following way:

representation = esp.nn.Sequential(
    layer=esp.nn.layers.dgl_legacy.gn("SAGEConv"), # use SAGEConv implementation in DGL
    config=[128, "relu", 128, "relu", 128, "relu"], # 3 layers, 128 units, ReLU activation
)

readout = esp.nn.readout.janossy.JanossyPooling(
    in_features=128, config=[128, "relu", 128, "relu", 128, "relu"],
    out_features={              # define modular MM parameters Espaloma will assign
        1: {"e": 1, "s": 1}, # atom hardness and electronegativity
        2: {"log_coefficients": 2}, # bond linear combination, enforce positive
        3: {"log_coefficients": 2}, # angle linear combination, enforce positive
        4: {"k": 6}, # torsion barrier heights (can be positive or negative)
    },
)

espaloma_model = torch.nn.Sequential(
                 representation, readout, 
                 esp.nn.readout.janossy.ExpCoefficients(),
                 esp.mm.geometry.GeometryInGraph(),
                 esp.mm.energy.EnergyInGraph(),
                 #esp.mm.energy.EnergyInGraph(suffix="_ref"),
                 esp.nn.readout.charge_equilibrium.ChargeEquilibrium(),
)

Now I am trying to overfit and I am training the following model:

normalize = esp.data.normalize.ESOL100LogNormalNormalize()
for idx_epoch in tqdm(range(2000)):
    intr_loss = 0
    k = 0
    for g in ds_tr_loader:
        optimizer.zero_grad()
        if torch.cuda.is_available():
            g = g.to("cuda:0")
        g = espaloma_model(g)
        g = normalize.unnorm(g)
            
        loss = loss_fn(g)
        loss.backward()
        optimizer.step()
        intr_loss += loss.item()

I am using the following loss function:

loss_fn = esp.metrics.GraphMetric(
        base_metric=torch.nn.MSELoss(),
        between=['u', "u_ref"],
        level="g",)

After training the train loss plot looks the following (epochs on the x-axis):

newplot (28)

The loss gets stuck at ~1.4M when you would expect it to be close to 0 (since I am training only on 5 examples). The energy for individual examples converges at some small positive value:

g_tr = espaloma_model(g_tr)
g_tr.nodes["g"].data["u"]
1.2619

If I do the same but on pepconf dataset (peptides) and get similar results. The output of espaloma is on a different scale.

My question is, what am I doing wrong? Is it the model architecture? The normalizer? Or smth else? Would appreciate any help.

Thanks!

Issue Analytics

State:
Created 2 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

yuanqing-wangcommented, May 21, 2022

The updated Colab notebook should work! http://data.wangyq.net/esp_notesbooks/qm_fitting.ipynb

Essentially the energies need to be centered before calculating the error.

0reactions

jchoderacommented, Jun 20, 2022

Reopening this issue to make sure we address the most recent comment!

@yuanqing-wang : Can you be sure to document the units for the saved gen2 data?
@yuanqing-wang : Would be good to document the units here too, as well as which part of the potential function is represented here—is it just the valence and coulomb components?