question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

textcat training is not deterministic with gpu enabled

See original GitHub issue

How to reproduce the behaviour

This is related to #6177. I can verify that when using CPU, the training losses/weights for textcat can be deterministic with fix_random_seed. However, if I enable GPU via spacy.require_gpu(), the training losses/weights become different every time.

import spacy
spacy.require_gpu()

for _ in range(2):
    spacy.util.fix_random_seed(0)

    model = spacy.load('en_core_web_sm')

    model.add_pipe(model.create_pipe('textcat'))
    model.remove_pipe('parser')
    model.remove_pipe('tagger')

    cat = model.get_pipe('textcat')
    cat.add_label("dog")
    cat.add_label("donut")

    model.begin_training()
    print(model("What even is?").cats)

Output:

{'dog': 0.2501096725463867, 'donut': 0.3427947163581848}
{'dog': 0.9567031860351562, 'donut': 0.9506585001945496}

Your Environment

  • Operating System: Linux
  • Python Version Used: 3.6.9
  • spaCy Version Used: latest on master (git sha: 320a8b14814c7e0c6dce705ad7bf0f13bf64b61c)
  • Environment Information: Google Colab

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:8 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
adrianeboydcommented, Nov 19, 2020

Here’s my test script (just adapted a bit from the one in the colab example):

import spacy
from spacy.util import minibatch, compounding

def train():
    spacy.util.fix_random_seed(0)
    model = spacy.blank("en")

    model.add_pipe(model.create_pipe("textcat"))

    cat = model.get_pipe("textcat")
    cat.add_label("dog")
    cat.add_label("donut")

    x_train = [f"example {i}" for i in range(1000)]
    y_train = [{"cats": {"dog": i/1000, "donut": 1 - i/1000}} for i in range(1000)]
    train_data = list(zip(x_train, y_train))

    optimizer = model.begin_training()
    for i in range(10):
        batches = minibatch(train_data, size=compounding(16, 64, 1.001))
        losses = {}
        for batch in batches:
            x_batch, y_batch = zip(*batch)
            model.update(x_batch, y_batch, sgd=optimizer, drop=0, losses=losses)
        print(i, "loss:", losses["textcat"])
    print("example 10:", model("example 10").cats)
    print()

if __name__ == "__main__":
    print("1st time CPU:")
    train()
    print("2nd time CPU:")
    train()
    print("\nEnabling GPU\n")
    spacy.require_gpu()
    print("1st time GPU:")
    train()
    print("2nd time GPU:")
    train()

Output:

1st time CPU:
0 loss: 0.020526510332956605
1 loss: 0.2192715626588324
2 loss: 0.1541586974939264
3 loss: 0.21435572720838536
4 loss: 0.1982542650088135
5 loss: 0.19825033005452042
6 loss: 0.19787737677813766
7 loss: 0.016827800470196053
8 loss: 0.02887996903154999
9 loss: 0.02469563187116819
example 10: {'dog': 0.001906172838062048, 'donut': 0.6181842684745789}

2nd time CPU:
0 loss: 0.020526510332956605
1 loss: 0.2192715626588324
2 loss: 0.1541586974939264
3 loss: 0.21435572720838536
4 loss: 0.1982542650088135
5 loss: 0.19825033005452042
6 loss: 0.19787737677813766
7 loss: 0.016827800470196053
8 loss: 0.02887996903154999
9 loss: 0.02469563187116819
example 10: {'dog': 0.001906172838062048, 'donut': 0.6181842684745789}


Enabling GPU

1st time GPU:
0 loss: 0.022869700213050237
1 loss: 0.06781688092814875
2 loss: 0.15603950362856267
3 loss: 0.029185388615587726
4 loss: 0.04577569641696755
5 loss: 0.03271988184133079
6 loss: 0.030841199260066787
7 loss: 0.016764739026257303
8 loss: 0.023379557263069728
9 loss: 0.020565684088069247
example 10: {'dog': 0.15584374964237213, 'donut': 0.9999545812606812}

2nd time GPU:
0 loss: 0.022846033180030645
1 loss: 0.07457155887192357
2 loss: 0.1533858735638205
3 loss: 0.03846120528942265
4 loss: 0.030317590604681754
5 loss: 0.022946339027839713
6 loss: 0.040068494405659294
7 loss: 0.023592466532136314
8 loss: 0.02665060829349386
9 loss: 0.021907005400862545
example 10: {'dog': 0.15843163430690765, 'donut': 0.9288136959075928}

I tested in a new venv with everything from wheels except spacy (from master as of now). example 10 is the model cats output for the text "example 10".

example 10 for a few more GPU runs:

{'dog': 0.2435295134782791, 'donut': 0.9999375343322754}
{'dog': 0.4791581332683563, 'donut': 0.9981231093406677}
{'dog': 0.6463608145713806, 'donut': 0.016409972682595253}
{'dog': 0.14756248891353607, 'donut': 0.9230985045433044}

pip freeze: freeze.txt

I redid the test with v3 and the results are a bit more variable than I thought between CPU and GPU, but they’re not that different across GPU runs.

CPU: {'dog': 0.0654868334531784, 'donut': 0.9892733693122864}
GPU 1: {'dog': 0.022449197247624397, 'donut': 0.9723042249679565}
GPU 2: {'dog': 0.02237524650990963, 'donut': 0.9726961255073547}
GPU 3: {'dog': 0.022426428273320198, 'donut': 0.9722701907157898}
GPU 4: {'dog': 0.02197781391441822, 'donut': 0.9722147583961487}
1reaction
adrianeboydcommented, Nov 16, 2020

Hmm, I do think there may be a bug of some sort here in spacy v2. Locally and with the colab example above I get consistent results within multiple CPU and GPU runs (also with our quick internal test cases related to this), but the CPU and GPU results are not similar to each other, and if I extend the training a bit I do get different results for multiple GPU runs. We will look into it!

In better news, with spacy v3 I get the same results on both (minus some float rounding differences, of course).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Models are not deterministic / reproducible on GPU #6490
I cannot reproduce the same results when training a NER model using GPU in Google Colab. When running the same code with CPU...
Read more >
How to handle non-determinism when training on a GPU?
TL;DR. Non-determinism for a priori deterministic operations come from concurrent (multi-threaded) implementations.
Read more >
A Workaround for Non-Determinism in TensorFlow - Two Sigma
Two Sigma researcher Ben Rossi demonstrates this problem for a neural network trained to recognize MNIST digits, and a workaround to enable training...
Read more >
Projects · spaCy Usage Documentation
spaCy projects let you manage and share end-to-end spaCy workflows for different use cases and domains, and orchestrate training, packaging and serving your ......
Read more >
Text Classification · Prodigy · An annotation tool for AI ...
At the end of the process, you export “gold-standard” data that you can train your model with. In Prodigy, you can use this...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found