question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Incoherent results of supposedly reproducible piece of code

See original GitHub issue

I was giving a tutorial these days when someone stopped me because random_state=0 was passed in the train_test_split() but their score did not match mine. I asked all the participants for their score. Some had the same, up to the last decimal, and the others had slightly different answers.

Here is the piece of code executed:

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
    data.data, data.target, 
    stratify=data.target, random_state=0)
lr = LogisticRegression().fit(X_train, y_train)
score = lr.score(X_test, y_test)

I have no idea if this behavior is to be expected since I did not set the random_state in the LogisticRegression(). The doc says it is not needed for the ‘lbfgs’ solver, which is the default value in all the versions tested here.

Since this kind of reproducibility errors cannot be easily tested with CI, I asked them for their system information. The report with the scores obtained can be found below.

Python version: 3.8.5 | packaged by conda-forge | (default, Sep 24 2020, 16:20:24) [MSC v.1916 64 bit (AMD64)]
Numpy version: 1.18.5
Scikit-learn version: 0.23.2
System: Windows
Release: 10
Version: 10.0.17763
Machine: AMD64
Processor: Intel64 Family 6 Model 142 Stepping 10, GenuineIntel

Score of sklearn code: 0.9370629370629371

Python version: 3.8.5 | packaged by conda-forge | (default, Sep 24 2020, 16:20:24) [MSC v.1916 64 bit (AMD64)]
Numpy version: 1.18.5
Scikit-learn version: 0.23.2
System: Windows
Release: 10
Version: 10.0.17763
Machine: AMD64
Processor: Intel64 Family 6 Model 142 Stepping 9, GenuineIntel

Score of sklearn code: 0.9370629370629371

Python version: 3.8.5 | packaged by conda-forge | (default, Sep 16 2020, 17:19:16) [MSC v.1916 64 bit (AMD64)]
Numpy version: 1.18.5
Scikit-learn version: 0.23.2
System: Windows
Release: 10
Version: 10.0.17763
Machine: AMD64
Processor: Intel64 Family 6 Model 142 Stepping 12, GenuineIntel

Score of sklearn code: 0.9440559440559441

Python version: 3.8.5 | packaged by conda-forge | (default, Sep 24 2020, 16:20:24) [MSC v.1916 64 bit (AMD64)]
Numpy version: 1.18.5
Scikit-learn version: 0.23.2
System: Windows
Release: 10
Version: 10.0.17763
Machine: AMD64
Processor: Intel64 Family 6 Model 61 Stepping 4, GenuineIntel

Score of sklearn code: 0.9300699300699301

Python version: 3.8.5 | packaged by conda-forge | (default, Aug 29 2020, 00:43:28) [MSC v.1916 64 bit (AMD64)]
Numpy version: 1.19.1
Scikit-learn version: 0.23.2
System: Windows
Release: 10
Version: 10.0.17763
Machine: AMD64
Processor: Intel64 Family 6 Model 142 Stepping 12, GenuineIntel

Score of sklearn code: 0.9230769230769231

Python version: 3.8.3 (default, Jul 2 2020, 17:30:36) [MSC v.1916 64 bit (AMD64)]
Numpy version: 1.18.5
Scikit-learn version: 0.23.1
System: Windows
Release: 10
Version: 10.0.17763
Machine: AMD64
Processor: Intel64 Family 6 Model 158 Stepping 13, GenuineIntel

Score of sklearn code: 0.9300699300699301

Python version: 3.8.6 | packaged by conda-forge | (default, Oct 7 2020, 18:22:52) [MSC v.1916 64 bit (AMD64)]
Numpy version: 1.19.1
Scikit-learn version: 0.23.2
System: Windows
Release: 10
Version: 10.0.17763
Machine: AMD64
Processor: Intel64 Family 6 Model 158 Stepping 13, GenuineIntel

Score of sklearn code: 0.9300699300699301

Python version: 3.8.5 | packaged by conda-forge | (default, Aug 29 2020, 00:43:28) [MSC v.1916 64 bit (AMD64)]
Numpy version: 1.18.5
Scikit-learn version: 0.23.2
System: Windows
Release: 10
Version: 10.0.17763
Machine: AMD64
Processor: Intel64 Family 6 Model 158 Stepping 10, GenuineIntel

Score of sklearn code: 0.9370629370629371

​Python version: 3.7.4 (default, Sep 18 2019, 19:37:15)
[Clang 10.0.1 (clang-1001.0.46.4)]
Numpy version: 1.18.4
Scikit-learn version: 0.23.2
System: Darwin
Release: 18.7.0
Version: Darwin Kernel Version 18.7.0: Mon Aug 31 20:53:32 PDT 2020; root:xnu-4903.278.44~1/RELEASE_X86_64
Machine: x86_64
Processor: i386

Score of sklearn code: 0.9300699300699301

_Originally posted by @aboucaud in https://github.com/scikit-learn/scikit-learn/issues/7139#issuecomment-705788880_

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:12 (12 by maintainers)

github_iconTop GitHub Comments

2reactions
aboucaudcommented, Nov 17, 2020

The results I got back from 5 of the participants so far tend to confirm @glemaitre hunch. The 6 scores, including mine, are all the same: 0.958041… while they were different on the previous test without scaling.

Thanks @glemaitre and @NicolasHug !

1reaction
NicolasHugcommented, Nov 14, 2020

I get convergence warning too, and no warning when standardizing the data.

I also get consistent results across executions in both cases. There would be a problem if we didn’t get reproducible results across executions on the same machine, but slight variations across machines are to be expected especially when the model didn’t converge.

I’ll close the issue, feel free to re-open if you observed a strong signal indicating a problem here

Read more comments on GitHub >

github_iconTop Results From Across the Web

C++ can I have operator<=> and operator== in a class with ...
Operators are just function calls; you can have them do anything you want, within the limitations of the language.
Read more >
How Space and Time Could Be a Quantum Error-Correcting ...
They conjectured in the Journal of High Energy Physics that space-time itself is a code — in anti-de Sitter (AdS) universes, at least....
Read more >
1695: Code Quality 2 - explain xkcd
This comic is the second in the Code Quality series: ... which is technically true, but the "for now" part implies that disaster...
Read more >
5 Best Practices for Reproducibility - Level Up Coding
Reproducibility is the defining feature of repetable execution, especially in coding. Translating original data and code to published results is ...
Read more >
Reproducible research: Strategies, tools, and workflows
Today, it is often used more loosely to refer to the practice of embedding computer code that implements a particular result – be...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found