Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Models not saved during training

See original GitHub issue

Question I tried asteroid/egs/wham/DPRNN/run.sh but the error was occurred at the end of the training process. The messages are below:

~~~
sep_clean_8kmin_7101f1a8/checkpoints/_ckpt_epoch_4.ckpt as top 5
Epoch 5: 100%|██████████| 4022/4022 [28:05<00:00,  2.39it/s, loss=-11.728, v_num=0, val_loss=-11.5]
Traceback (most recent call last):
  File "train.py", line 121, in <module>
    main(arg_dic)
  File "train.py", line 92, in main
    best_path = [b for b, v in best_k.items() if v == min(best_k.values())][0]
IndexError: list index out of range
~~~

I have tried to add some codes at train.py and confirmed the length of checkpoint.best_k_models.items() is zero. And best_k_models.json contains only {}.

Does anyone have any idea to fix it? Let me know if you have any comments.

Environment

Python 3.7.7
torch 1.5.1 (I’ve tried 1.3.0 but same result)
pytorch-lightning 0.7.6
Ubuntu 18.04 on GCP

Issue Analytics

State:
Created 3 years ago
Comments:5

Top GitHub Comments

1reaction

mparientecommented, Jul 27, 2020

Let’s keep this open until it’s merged, thanks!

0reactions

mparientecommented, Aug 23, 2020

This should be fixed in master

Read more comments on GitHub >

Top Results From Across the Web

Model not saved after training in PyTorch - Stack Overflow

I encounter the following problem. I perform an increasing cross-validation; I have 20 subjects in my dataset and try to classify images.

Save and load models | TensorFlow Core

Model progress can be saved during and after training. This means a model can resume where it left off and avoid long training...

custom training logic in subclassing model not saved #38103

When I save my model (Model.save) with the custom training logic and then I want to load it, the custom training loop is...

How to Save and Load Your Keras Deep Learning Model

The weights are saved directly from the model using the save_weights() function and later loaded using the symmetrical load_weights() function.

Saving and Loading Models - PyTorch

When saving a general checkpoint, to be used for either inference or resuming training, you must save more than just the model's state_dict....

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

Error in Training ConvTasNet on LibriMix

About result of DPRNN in wham dataset