Models not saved during training
See original GitHub issueQuestion I tried asteroid/egs/wham/DPRNN/run.sh but the error was occurred at the end of the training process. The messages are below:
~~~
sep_clean_8kmin_7101f1a8/checkpoints/_ckpt_epoch_4.ckpt as top 5
Epoch 5: 100%|██████████| 4022/4022 [28:05<00:00, 2.39it/s, loss=-11.728, v_num=0, val_loss=-11.5]
Traceback (most recent call last):
File "train.py", line 121, in <module>
main(arg_dic)
File "train.py", line 92, in main
best_path = [b for b, v in best_k.items() if v == min(best_k.values())][0]
IndexError: list index out of range
~~~
I have tried to add some codes at train.py and confirmed the length of checkpoint.best_k_models.items() is zero. And best_k_models.json contains only {}.
Does anyone have any idea to fix it? Let me know if you have any comments.
Environment
-
Python 3.7.7
-
torch 1.5.1 (I’ve tried 1.3.0 but same result)
-
pytorch-lightning 0.7.6
-
Ubuntu 18.04 on GCP
Issue Analytics
- State:
- Created 3 years ago
- Comments:5
Top Results From Across the Web
Model not saved after training in PyTorch - Stack Overflow
I encounter the following problem. I perform an increasing cross-validation; I have 20 subjects in my dataset and try to classify images.
Read more >Save and load models | TensorFlow Core
Model progress can be saved during and after training. This means a model can resume where it left off and avoid long training...
Read more >custom training logic in subclassing model not saved #38103
When I save my model (Model.save) with the custom training logic and then I want to load it, the custom training loop is...
Read more >How to Save and Load Your Keras Deep Learning Model
The weights are saved directly from the model using the save_weights() function and later loaded using the symmetrical load_weights() function.
Read more >Saving and Loading Models - PyTorch
When saving a general checkpoint, to be used for either inference or resuming training, you must save more than just the model's state_dict....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Let’s keep this open until it’s merged, thanks!
This should be fixed in master