Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

watch() sometimes does not log the gradients

See original GitHub issue

Describe the bug I am training a generative adversarial network. Without adversarial training, the gradients of the generator is logged. With adversarial training, the gradients of the generator is not logged.

To Reproduce Steps to reproduce the behavior:

Go to https://github.com/AliaksandrSiarohin/first-order-model
Add the following at line 71 of run.py

wandb.watch(generator, log='all')
wandb.watch(discriminator, log='all')
wandb.watch(kp_detector, log='all')

Run python run.py --config config/vox-256.yaml and we can see the generator and kp_detector gradients are logged.
Run python run.py --config config/vox-adv-256.yaml, which is adversarial training, and we can see that only discriminator gradients are logged, not those of generator and kp_detector.

Expected behavior The gradients of all the three network should be logged for adversarial training.

Screenshots

Operating System

OS: Linux

Additional context conda list output: https://pastebin.com/sV920TQP

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:16 (6 by maintainers)

Top GitHub Comments

1reaction

vanpeltcommented, Dec 13, 2021

It’s likely regular RAM that’s the issue.

1reaction

vanpeltcommented, Dec 9, 2021

It’s likely getting killed due to memory preasure. We have to load the gradients from the GPU and if your model is really large your notebook may not have enough memory. Are you able to get a larger instance for your notebook?