question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Different GPT-2 outputs with mixed precision vs single precision

See original GitHub issue

When using GPT-2 with mixed precision, the generated text is different from that produced by running it normally. This is true for both conditional and unconditional generation, and for top_k=1 (deterministic) and top_k=40. Typically the mixed precision and single precision outputs agree for a number of tokens and then begin to disagree (sometimes early, sometimes late).

Using GPT-2 with mixed precision would be useful to take advantage of the tensor cores on V100 and T4 GPUs.

Testing by calling model.half() on GPT2LMHeadModel tends to start producing incorrect outputs early, while instead using Apex’s AMP usually produces correct outputs for a little longer but still generally deviates. My tests were on the 117M model, with Apex installed.

It surprises me that the top_k=1 results often differ, sometimes very early in the sequence. They only take the largest logits, so this means the ranking of the logits is different.

I think the cause is compounding errors in the “past” tensor used by the attention function. Each time a new token is generated, its past has some error in it. When subsequent token generations then use those values (in higher attention layers), their own pasts have more error. And so on, up through 16 layers for 117M or 24 for 345M. For cases where the top 2 logit values are almost the same, those 16 steps of error might be enough to change which one is larger and thereby change even the top_k=1 output. I haven’t verified this idea yet.

I’m not sure if this necessarily means the outputs will be qualitatively worse, but that’s a hard thing to measure.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:9 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
AdamDanielKingcommented, Mar 9, 2020

@Damiox While sampling with mixed precision gives different results, they seem to still be of high quality. I’ve been using mixed precision on talktotransformer.com for at least 6-7 months now and the quality has been excellent.

0reactions
patrickvonplatencommented, Jun 4, 2020

Currently generation only allows batch_size=1

Read more comments on GitHub >

github_iconTop Results From Across the Web

Difference Between Single-, Double-, Multi-, Mixed-Precision
In double-precision format, each number takes up 64 bits. Single-precision format uses 32 bits, while half-precision is just 16 bits.
Read more >
OpenAI GPT2 - Hugging Face
GPT-2 is one of them and is available in five different sizes: small, medium, large, ... from PretrainedConfig and can be used to...
Read more >
Natural Language Generation Part 2: GPT2 and Huggingface
One thing to call out in the above script call is that I am using mixed precision in the model training with the...
Read more >
Mixed precision - Habana Developers
Mixed precision is the use of both 16-bit and 32-bit floating-point types in a model during training to make it run faster and...
Read more >
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
combination of vector-wise quantization and mixed precision ... a single outlier can reduce the quantization precision of all other values.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found