question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Training baseline scores vary despite random seeds fixed

See original GitHub issue

Raised this issue first on the Prodigy support forum here but it’s actually a Spacy issue.

I have been using prodigy to train a ‘texcat’ model like so:

python -m prodigy train textcat my_annotations en_vectors_web_lg --output ./my_model

and I noticed that the baseline score hugely varies between runs (0.2-0.55). This is even more puzzling to me given fix_random_seed(0) is called at the beginning of training.

I tracked down these variations to be coming from the model output. This is a minimal example to re-create this behaviour.

How to reproduce the behaviour

import spacy

component = 'textcat'
pipe_cfg = {"exclusive_classes": False}

for i in range(5):
    spacy.util.fix_random_seed(0)

    nlp = spacy.load('en_vectors_web_lg')

    example = ("Once hot, form ping-pong-ball-sized balls of the mixture, each weighing roughly 25 g.",
                {'cats': {'Labe1': 1.0, 'Label2': 0.0, 'Label3': 0.0}})


    # Set up component pipe
    nlp.add_pipe(nlp.create_pipe(component, config=pipe_cfg), last=True)
    pipe = nlp.get_pipe(component)
    for label in set(example[1]['cats']):
        pipe.add_label(label)

    # Set up training and optimiser
    optimizer = nlp.begin_training(component_cfg={component: pipe_cfg })

    # Run one document through textcat NN for scoring
    print(f"Scoring '{example[0]}'")
    print(f"Result: {pipe.model([nlp.make_doc(example[0])])}")

Calling fix_random_seeds should create the same output given a fixed seed and no weight updates as far as I understand. It does indeed in the linear model but not the CNN model if I read the architecture of the model correctly here https://github.com/explosion/spaCy/blob/908dea39399bbc0c966c131796f339af5de54140/spacy/_ml.py#L708 So the output from the first half of the first layer stays the same for each iteration but the second half does not.

Your Environment

  • spaCy version: 2.2.4
  • Platform: Darwin-18.7.0-x86_64-i386-64bit
  • Python version: 3.7.7
  • thinc version 7.4.0

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
svlandegcommented, Jul 9, 2020

Hi @michel-ds, we found the problem and resolved it in PR #5735 - I added your specific test to the test suite and it runs now without error: https://github.com/explosion/spaCy/blob/develop/spacy/tests/regression/test_issue5551.py This will be fixed from spaCy 3.0 onwards.

1reaction
michel-dscommented, Jul 10, 2020

Hi @svlandeg I can confirm that I am getting identical numbers with the develop branch version of SpaCy.

Scoring 'Once hot, form ping-pong-ball-sized balls of the mixture, each weighing roughly 25 g.'
Result: (array([[0.37729517, 0.7529206 , 0.46667254]], dtype=float32), <function forward.<locals>.backprop at 0x1149c64d0>)
Scoring 'Once hot, form ping-pong-ball-sized balls of the mixture, each weighing roughly 25 g.'
Result: (array([[0.37729517, 0.7529206 , 0.46667254]], dtype=float32), <function forward.<locals>.backprop at 0x1127b1c20>)
Scoring 'Once hot, form ping-pong-ball-sized balls of the mixture, each weighing roughly 25 g.'
Result: (array([[0.37729517, 0.7529206 , 0.46667254]], dtype=float32), <function forward.<locals>.backprop at 0x1149ddf80>)
Scoring 'Once hot, form ping-pong-ball-sized balls of the mixture, each weighing roughly 25 g.'
Result: (array([[0.37729517, 0.7529206 , 0.46667254]], dtype=float32), <function forward.<locals>.backprop at 0x113bf3560>)
Scoring 'Once hot, form ping-pong-ball-sized balls of the mixture, each weighing roughly 25 g.'
Result: (array([[0.37729517, 0.7529206 , 0.46667254]], dtype=float32), <function forward.<locals>.backprop at 0x1127b8a70>)

I had to use a blank model in the code snippet above nlp = spacy.blank("en") but I hope that didn’t falsify the results of my test.

Thanks for fixing! Looking forward to version 3.0.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Training baseline scores vary despite random seed fixed
and I noticed that the baseline score hugely varies between runs (0.2-0.55). This is even more puzzling to me given fix_random_seed(0) is ...
Read more >
Single training runs and estimates of variance - Mathias Müller
If the random seed is fixed, the behaviour of the RNG is deterministic. ... 3 BLEU scores obtained by training 3 baseline models...
Read more >
How to correct for results varying with random seed?
First of all, observing substantial variation depending on the random seed means that you should explore this in more detail, ...
Read more >
when do curricula work?
We show curriculum learning, random, and anti-curriculum learning ... to right columns are: c-score baseline, curriculum-based training,.
Read more >
To Seed or Not to Seed? An Empirical Analysis of Usage of ...
Abstract—Many Machine Learning (ML) algorithms are inher- ently random in nature – executing them using the same inputs.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found