Differences in scores between two different versions
See original GitHub issueI was working on a private dataset. So I used Scikit-Learn v0.24.2, the stable version. Then, I had to install pycaret which uses Scikit-Learn version 0.23.2.
So I decided to run again my notebook where my models were trained (RandomForestClassifier
, GradientBoostingClassifier
, and other classifiers). But I found a difference in scores. There isn’t much difference, but it did make me wonder if it was a bug between versions or just due to the different algorithms used. Logically, if the parameters and the random_state
do not change, this should not happen (at least, when I do this test on the same version, there is no change for some models).
First, I thought I had changed the parameters of my models in the meantime, or the random_state
. Unfortunately, there was no change. Secondly, I looked to see if the initial parameters had been changed but my search was unsuccessful.
I accept that logistic regression or stochastic gradient descent can change, but the random forest or the k nearest neighbors, I have trouble seeing why they change. So I come to you, to know if this is a bug, or a function or a class at the base of the estimators that could have been changed impacting the models, or just me that misunderstood the different models.
The code is long, so I put it in a repo created for this purpose. To reproduce this, you need to run the notebooks through 2 different versions of Scikit-Learn:
- 0.23.2 for this notebook
- 0.24.2 for this notebook
So you need a virtual environment for each notebook (try to use venv
instead of pipenv
).
Global Python version: 3.8 OS: Windows 10 64 bits on Builds 18363.1440 and 19041 (my work PC and my personnal PC).
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (2 by maintainers)
Top GitHub Comments
Okay fine. It’s just that it seemed strange to me that the scores could change between versions with the same parameters. Thanks for answering my questions 👍
I am not sure that we will be able to track down the changes.
I sometimes may fix some bug that can alter the random generator. It could also be linked to a change of the NumPy version used.