LayoutLMv2 nan training loss and eval
See original GitHub issueDescribe the bug Model I am using is LayoutLMv2 with custom dataset.
The problem arises when using:
- the official example scripts: I am using the same run_funsd.py, but using a modified dataset.
To Reproduce Steps to reproduce the behavior:
run_funsd.py --do_eval=True --do_predict=True --do_train=True --early_stop_patience=4 --evaluation_strategy=epoch --fp16=True --load_best_model_at_end=True --max_train_samples=1000 --model_name_or_path=microsoft/layoutlmv2-base-uncased --num_train_epochs=30 --output_dir=/tmp/test-ner --overwrite_output_dir=True --report_to=wandb --save_strategy=epoch --save_total_limit=1 --warmup_ratio=0.1
Fortunately, I recorded everything with wandb.
After 8 epochs the training and eval loss went to nan, while the f1 score dropped suddenly. The samples per second increased significantly as well.
- Platform:
- Python version: 3.7.1
- PyTorch version (GPU?): tesla T4
Issue Analytics
- State:
- Created 2 years ago
- Comments:11
Top Results From Across the Web
`nan` training loss but eval loss does improve over time
I've been playing around with the XLSR-53 fine-tuning functionality but I keep getting nan training loss. Audio files I'm using are: Down-sampled to...
Read more >layoutlmv2: multi-modal pre-training for visually-rich document ...
paper, we present LayoutLMv2 by pre-training text, layout and image in a multi-modal framework, where new model architectures and pre-training tasks.
Read more >LayoutLMv2: Multi-modal Pre-training for Visually-rich ...
loss in the optimization process. 3 Experiments. 3.1 Data. In order to pre-train and evaluate LayoutLMv2 models, we select datasets in a ...
Read more >arXiv:2012.14740v4 [cs.CL] 10 Jan 2022
LayoutLMv2 : Multi-modal Pre-training for Visually-rich. Document Understanding ... datasets as the downstream tasks to evaluate the per-.
Read more >(PDF) LayoutLMv2: Multi-modal Pre-training for Visually-Rich ...
PDF | Pre-training of text and layout has proved effective in a variety of ... In order to pre-train and evaluate LayoutLMv2 models, ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Perhaps there is no problem with the loss calculation code. In my case, I got NaN value only when calculating loss with autocast(), but when I stopped using amp, I no longer get NaN value. I hope this will be helpful to you.
NaN with AMP is a known issue. https://github.com/pytorch/pytorch/issues/40497
您发的邮件已收到,谢谢! Your email has been received, thank you! Ihre e - mail bekommen, danke! あなたのメールが届きましたが、ありがとうございます!——————————————————————————Xue Xu Tel: @.***