Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Low int8 accuracy after fine tuning BERT-XNLI

See original GitHub issue

Hi, I followed the instruction in https://github.com/openvinotoolkit/nncf/tree/develop/third_party_integration/huggingface_transformers#bert-xnli to QAT and evaluate int8 BERT-XNLI. The README says the int8 accuracy should be 77.22%, but my result is far worse than that: it is only 33% as shown in the output below.

Int8 accuracy:

***** eval metrics *****
  epoch                   =        4.0
  eval_accuracy           =     0.3333
  eval_loss               =     1.0988
  eval_runtime            = 0:06:18.44
  eval_samples            =       2490
  eval_samples_per_second =       6.58
  eval_steps_per_second   =       6.58

In contrast, the float model, trained with fp16 precision, gives good accuracy:

FP16 accuracy:

***** eval metrics *****
  epoch                   =        4.0
  eval_accuracy           =     0.7614
  eval_loss               =     0.7525
  eval_runtime            = 0:00:15.91
  eval_samples            =       2490
  eval_samples_per_second =    156.443
  eval_steps_per_second   =    156.443

I’m wondering what is wrong in my setup. The only change I made from the instruction is to reduce the batch size from 48 to 24.

My PyTorch version is 1.9.1, my local nncf install is at commit https://github.com/openvinotoolkit/nncf/commit/1bd52822dec121a7c7ffaab7364e21232dd07ef4, and transformers is at commit bff1c71e84e392af9625c345f9ea71f7b6d75fb3 as specified in the README.

Issue Analytics

State:
Created 2 years ago
Comments:8 (4 by maintainers)

Top GitHub Comments

1reaction

vshamporcommented, Oct 11, 2021

Hope you were able to reach your goals, @masahi! Note that since training is somewhat non-deterministic, the number of epochs and the learning rate may vary to reach the target 77.22% accuracy - sometimes the accuracy is reached for the 1st or 2nd epoch checkpoint, and other times it takes all 4. If you are unable to reach 77.22% or a similar value after multiple hyperparam search attempts, though, feel free to open another issue and we will look into it.

0reactions

masahicommented, Oct 11, 2021

Hi @vshampor, I tried again with PT 1.9.1 and this time, the accuracy looks good. I don’t know what I was doing when I hit the 33% accuracy, but all is well now. Thank you and sorry for your trouble.

***** eval metrics *****
  epoch                   =        4.0
  eval_accuracy           =     0.7422
  eval_loss               =     0.6483
  eval_runtime            = 0:06:17.94
  eval_samples            =       2490
  eval_samples_per_second =      6.588
  eval_steps_per_second   =      6.588

Top Results From Across the Web

On the Stability of Fine-tuning BERT - OpenReview

This paper identifies the causal factors behind a major known issue in deep learning for NLP: Fine-tuning models on small datasets after self-supervised ......

Fine-tuning Pre-trained BERT Models - GluonNLP

In real production, there are two main benefits of lower precision (INT8). First, the computation can be accelerated by the low precision instruction,...

Fine-tuning BERT for Natural Language Inference

Train set accuracy for the tuned model is 88.8% which is very close to (not-tuned) model trained with full training set. accuracy(tuned_model, ...

Intel/bert-base-uncased-finetuned-swag-int8-static

This is an INT8 PyTorch model quantized with huggingface/optimum-intel through the usage of Intel® Neural Compressor. The original fp32 model ...

16.7. Natural Language Inference: Fine-Tuning BERT

In this section, we will download a pretrained small version of BERT, then fine-tune it for natural language inference on the SNLI dataset....