Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

LayoutLMv2 nan training loss and eval

See original GitHub issue

Describe the bug Model I am using is LayoutLMv2 with custom dataset.

The problem arises when using:

the official example scripts: I am using the same run_funsd.py, but using a modified dataset.

To Reproduce Steps to reproduce the behavior:

run_funsd.py --do_eval=True --do_predict=True --do_train=True --early_stop_patience=4 --evaluation_strategy=epoch --fp16=True --load_best_model_at_end=True --max_train_samples=1000 --model_name_or_path=microsoft/layoutlmv2-base-uncased --num_train_epochs=30 --output_dir=/tmp/test-ner --overwrite_output_dir=True --report_to=wandb --save_strategy=epoch --save_total_limit=1 --warmup_ratio=0.1

Fortunately, I recorded everything with wandb.

After 8 epochs the training and eval loss went to nan, while the f1 score dropped suddenly. The samples per second increased significantly as well.

Platform:
Python version: 3.7.1
PyTorch version (GPU?): tesla T4

Issue Analytics

State:
Created 2 years ago
Comments:11

Top GitHub Comments

1reaction

magatarocommented, May 20, 2022

Perhaps there is no problem with the loss calculation code. In my case, I got NaN value only when calculating loss with autocast(), but when I stopped using amp, I no longer get NaN value. I hope this will be helpful to you.

NaN with AMP is a known issue. https://github.com/pytorch/pytorch/issues/40497

0reactions

XueAdascommented, May 20, 2022

您发的邮件已收到，谢谢！ Your email has been received, thank you! Ihre e - mail bekommen, danke! あなたのメールが届きましたが、ありがとうございます！——————————————————————————Xue Xu Tel: @.***

Top Results From Across the Web

`nan` training loss but eval loss does improve over time

I've been playing around with the XLSR-53 fine-tuning functionality but I keep getting nan training loss. Audio files I'm using are: Down-sampled to...

layoutlmv2: multi-modal pre-training for visually-rich document ...

paper, we present LayoutLMv2 by pre-training text, layout and image in a multi-modal framework, where new model architectures and pre-training tasks.

LayoutLMv2: Multi-modal Pre-training for Visually-rich ...

loss in the optimization process. 3 Experiments. 3.1 Data. In order to pre-train and evaluate LayoutLMv2 models, we select datasets in a ...

arXiv:2012.14740v4 [cs.CL] 10 Jan 2022

LayoutLMv2 : Multi-modal Pre-training for Visually-rich. Document Understanding ... datasets as the downstream tasks to evaluate the per-.

(PDF) LayoutLMv2: Multi-modal Pre-training for Visually-Rich ...

PDF | Pre-training of text and layout has proved effective in a variety of ... In order to pre-train and evaluate LayoutLMv2 models, ......