Incoherent results of supposedly reproducible piece of code
See original GitHub issueI was giving a tutorial these days when someone stopped me because random_state=0
was passed in the train_test_split()
but their score did not match mine. I asked all the participants for their score. Some had the same, up to the last decimal, and the others had slightly different answers.
Here is the piece of code executed:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
data.data, data.target,
stratify=data.target, random_state=0)
lr = LogisticRegression().fit(X_train, y_train)
score = lr.score(X_test, y_test)
I have no idea if this behavior is to be expected since I did not set the random_state
in the LogisticRegression()
. The doc says it is not needed for the ‘lbfgs’ solver, which is the default value in all the versions tested here.
Since this kind of reproducibility errors cannot be easily tested with CI, I asked them for their system information. The report with the scores obtained can be found below.
Python version: 3.8.5 | packaged by conda-forge | (default, Sep 24 2020, 16:20:24) [MSC v.1916 64 bit (AMD64)]
Numpy version: 1.18.5
Scikit-learn version: 0.23.2
System: Windows
Release: 10
Version: 10.0.17763
Machine: AMD64
Processor: Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
Score of sklearn code: 0.9370629370629371
Python version: 3.8.5 | packaged by conda-forge | (default, Sep 24 2020, 16:20:24) [MSC v.1916 64 bit (AMD64)]
Numpy version: 1.18.5
Scikit-learn version: 0.23.2
System: Windows
Release: 10
Version: 10.0.17763
Machine: AMD64
Processor: Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
Score of sklearn code: 0.9370629370629371
Python version: 3.8.5 | packaged by conda-forge | (default, Sep 16 2020, 17:19:16) [MSC v.1916 64 bit (AMD64)]
Numpy version: 1.18.5
Scikit-learn version: 0.23.2
System: Windows
Release: 10
Version: 10.0.17763
Machine: AMD64
Processor: Intel64 Family 6 Model 142 Stepping 12, GenuineIntel
Score of sklearn code: 0.9440559440559441
Python version: 3.8.5 | packaged by conda-forge | (default, Sep 24 2020, 16:20:24) [MSC v.1916 64 bit (AMD64)]
Numpy version: 1.18.5
Scikit-learn version: 0.23.2
System: Windows
Release: 10
Version: 10.0.17763
Machine: AMD64
Processor: Intel64 Family 6 Model 61 Stepping 4, GenuineIntel
Score of sklearn code: 0.9300699300699301
Python version: 3.8.5 | packaged by conda-forge | (default, Aug 29 2020, 00:43:28) [MSC v.1916 64 bit (AMD64)]
Numpy version: 1.19.1
Scikit-learn version: 0.23.2
System: Windows
Release: 10
Version: 10.0.17763
Machine: AMD64
Processor: Intel64 Family 6 Model 142 Stepping 12, GenuineIntel
Score of sklearn code: 0.9230769230769231
Python version: 3.8.3 (default, Jul 2 2020, 17:30:36) [MSC v.1916 64 bit (AMD64)]
Numpy version: 1.18.5
Scikit-learn version: 0.23.1
System: Windows
Release: 10
Version: 10.0.17763
Machine: AMD64
Processor: Intel64 Family 6 Model 158 Stepping 13, GenuineIntel
Score of sklearn code: 0.9300699300699301
Python version: 3.8.6 | packaged by conda-forge | (default, Oct 7 2020, 18:22:52) [MSC v.1916 64 bit (AMD64)]
Numpy version: 1.19.1
Scikit-learn version: 0.23.2
System: Windows
Release: 10
Version: 10.0.17763
Machine: AMD64
Processor: Intel64 Family 6 Model 158 Stepping 13, GenuineIntel
Score of sklearn code: 0.9300699300699301
Python version: 3.8.5 | packaged by conda-forge | (default, Aug 29 2020, 00:43:28) [MSC v.1916 64 bit (AMD64)]
Numpy version: 1.18.5
Scikit-learn version: 0.23.2
System: Windows
Release: 10
Version: 10.0.17763
Machine: AMD64
Processor: Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
Score of sklearn code: 0.9370629370629371
Python version: 3.7.4 (default, Sep 18 2019, 19:37:15)
[Clang 10.0.1 (clang-1001.0.46.4)]
Numpy version: 1.18.4
Scikit-learn version: 0.23.2
System: Darwin
Release: 18.7.0
Version: Darwin Kernel Version 18.7.0: Mon Aug 31 20:53:32 PDT 2020; root:xnu-4903.278.44~1/RELEASE_X86_64
Machine: x86_64
Processor: i386
Score of sklearn code: 0.9300699300699301
_Originally posted by @aboucaud in https://github.com/scikit-learn/scikit-learn/issues/7139#issuecomment-705788880_
Issue Analytics
- State:
- Created 3 years ago
- Comments:12 (12 by maintainers)
The results I got back from 5 of the participants so far tend to confirm @glemaitre hunch. The 6 scores, including mine, are all the same: 0.958041… while they were different on the previous test without scaling.
Thanks @glemaitre and @NicolasHug !
I get convergence warning too, and no warning when standardizing the data.
I also get consistent results across executions in both cases. There would be a problem if we didn’t get reproducible results across executions on the same machine, but slight variations across machines are to be expected especially when the model didn’t converge.
I’ll close the issue, feel free to re-open if you observed a strong signal indicating a problem here