question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

weird behavior when trying to use amp

See original GitHub issue

Not sure if you’ve seen anything like this before? I’ve used amp on some other models and it just worked so I’m not sure how to debug. It keeps doing this over and over:

root@C.612345:/workspace/stylegan2-pytorch$ bin/stylegan2_pytorch --data ../imgs/ --image-size 128 --batch_size 32 --gradient_accumulate_every 1 --learning_rate 0.002 --fp16                                                               
Selected optimization level O2:  FP16 training with FP32 batchnorm and FP32 master weights.                                                                                                                                                 
                                                                                                                                                                                                                                            
Defaults for this optimization level are:                                                                                                                                                                                                   
enabled                : True                                                                                                                                                                                                               
opt_level              : O2                                                                                                                                                                                                                 
cast_model_type        : torch.float16                                                                                                                                                                                                      
patch_torch_functions  : False                                                                                                                                                                                                              
keep_batchnorm_fp32    : True                                                                                                                                                                                                               
master_weights         : True                                                                                                                                                                                                               
loss_scale             : dynamic                                                                                                                                                                                                            
Processing user overrides (additional kwargs that are not None)...                                                                                                                                                                          
After processing overrides, optimization options are:                                                                                                                                                                                       
enabled                : True                                                                                                                                                                                                               
opt_level              : O2                                                                                                                                                                                                                 
cast_model_type        : torch.float16                                                                                                                                                                                                      
patch_torch_functions  : False                                                                                                                                                                                                              
keep_batchnorm_fp32    : True                                                                                                                                                                                                               
master_weights         : True                                                                                                                                                                                                               
loss_scale             : dynamic                                                                                                                                                                                                            
default<../imgs/>:   0%|                                                                                                                                                                                         | 0/150000 [00:00<?, ?it/s]
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0                                                                                                                                                             
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0                                                                                                                                                             
G: -9.80 | D: 10.73 | GP: 404.56 | PL: 0.76 | CR: 0.00 | Q: 0.00                                                                                                                                                                            
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 8192.0                                                                                                                                                              
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4096.0                                                                                                                                                              
/opt/conda/conda-bld/pytorch_1591914880026/work/torch/csrc/utils/python_arg_parser.cpp:756: UserWarning: This overload of add_ is deprecated:                                                                                               
        add_(Number alpha, Tensor other)                                                                                                                                                                                                    
Consider using one of the following signatures instead:                                                                                                                                                                                     
        add_(Tensor other, *, Number alpha)                                                                                                                                                                                                 
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2048.0                                                                                                                                                              
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1024.0                                                                                                                                                              
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 512.0                                                                                                                                                               
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 256.0                                                                                                                                                               
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 128.0                                                                                                                                                               
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 64.0                                                                                                                                                                
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32.0                                                                                                                                                                
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16.0                                                                                                                                                                
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 8.0                                                                                                                                                                 
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4.0                                                                                                                                                                 
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2.0                                                                                                                                                                 
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.0                                                                                                                                                                 
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.5                                                                                                                                                                 
NaN detected for generator or discriminator. Loading from checkpoint #0                                                                                                                                                                     
Selected optimization level O2:  FP16 training with FP32 batchnorm and FP32 master weights.                                                                                                                                                 
                                                                                                                                                                                                                                            
Defaults for this optimization level are:                                                                 
enabled                : True                                                                                                                                                                                                               
opt_level              : O2                                                                                                                                                                                                                 
cast_model_type        : torch.float16                                                                                                                                                                                                      
patch_torch_functions  : False                                                                                                                                                                                                              
keep_batchnorm_fp32    : True                                                                                                                                                                                                               
master_weights         : True                                                                                                                                                                                                               
loss_scale             : dynamic                                                                                                                                                                                                            
Processing user overrides (additional kwargs that are not None)...                                                                                                                                                                          
After processing overrides, optimization options are:                                                                                                                                                                                       
enabled                : True                                                                                                                                                                                                               
opt_level              : O2                                                                                                                                                                                                                 
cast_model_type        : torch.float16                                                                                                                                                                                                      
patch_torch_functions  : False                                                                                                                                                                                                              
keep_batchnorm_fp32    : True                                                                                                                                                                                                               
master_weights         : True                                                                                                                                                                                                               
loss_scale             : dynamic
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0
default<../imgs/>:   0%|                                                                                                                                                                             | 11/150000 [00:15<60:25:16,  1.45s/it]
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 8192.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4096.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2048.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1024.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 512.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 256.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 128.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 64.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 8.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.5
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.25
NaN detected for generator or discriminator. Loading from checkpoint #0
Selected optimization level O2:  FP16 training with FP32 batchnorm and FP32 master weights.

Defaults for this optimization level are:
enabled                : True
opt_level              : O2
cast_model_type        : torch.float16
patch_torch_functions  : False
keep_batchnorm_fp32    : True
master_weights         : True
loss_scale             : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O2
cast_model_type        : torch.float16
patch_torch_functions  : False
keep_batchnorm_fp32    : True
master_weights         : True
loss_scale             : dynamic
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 8192.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4096.0
default<../imgs/>:   0%|                                                                                                                                                                             | 22/150000 [00:26<54:04:05,  1.30s/it]
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2048.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1024.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 512.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 256.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 128.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 64.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 8.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.5
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.25
NaN detected for generator or discriminator. Loading from checkpoint #0                                                                

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:37 (26 by maintainers)

github_iconTop GitHub Comments

2reactions
GetsEclecticcommented, Jul 30, 2020

Thinking about trying to get it working with pytorch 1.6 native mixed precision.

1reaction
GetsEclecticcommented, Aug 12, 2020

Yeah I’ve read that page a bunch of times but didn’t realize I should implement gradient clipping to fix this problem, thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

amplifiers - Weird amp behaviour - Music: Practice & Theory Stack ...
My solid-state amp isn't putting out any sounds lately. But when I turn it off there is a short time where I can...
Read more >
Weird Amp Behaviour - Gearspace.com
Hi everyone! First post So I work in a school and we just got two Harley Benton HB40-R Guitar Amplifiers. I put a...
Read more >
Weird diff amp behavior? | Electronics Forums - Maker Pro
hey, i'm using a LM324 opamp configured as a diff amp. the inputs come from opamp ... yep - i forgot to use...
Read more >
Strange behavior of an op amp in non-inverting amplifier ...
So for your single supply configuration, it will not work below 3V. If you go outside the common mode voltage range "phase inversion"...
Read more >
Strange behavior from peak detector's certain op-amps - TI E2E
Strange behavior from peak detector's certain op-amps ... load drive of the OPA379 is insufficient to properly work in this circuit.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found