Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Differences in scores between two different versions

See original GitHub issue

I was working on a private dataset. So I used Scikit-Learn v0.24.2, the stable version. Then, I had to install pycaret which uses Scikit-Learn version 0.23.2.

So I decided to run again my notebook where my models were trained (RandomForestClassifier, GradientBoostingClassifier, and other classifiers). But I found a difference in scores. There isn’t much difference, but it did make me wonder if it was a bug between versions or just due to the different algorithms used. Logically, if the parameters and the random_state do not change, this should not happen (at least, when I do this test on the same version, there is no change for some models).

First, I thought I had changed the parameters of my models in the meantime, or the random_state. Unfortunately, there was no change. Secondly, I looked to see if the initial parameters had been changed but my search was unsuccessful.

I accept that logistic regression or stochastic gradient descent can change, but the random forest or the k nearest neighbors, I have trouble seeing why they change. So I come to you, to know if this is a bug, or a function or a class at the base of the estimators that could have been changed impacting the models, or just me that misunderstood the different models.

The code is long, so I put it in a repo created for this purpose. To reproduce this, you need to run the notebooks through 2 different versions of Scikit-Learn:

0.23.2 for this notebook
0.24.2 for this notebook

So you need a virtual environment for each notebook (try to use venv instead of pipenv).

Global Python version: 3.8 OS: Windows 10 64 bits on Builds 18363.1440 and 19041 (my work PC and my personnal PC).

Issue Analytics

State:
Created 2 years ago
Comments:6 (2 by maintainers)

Top GitHub Comments

1reaction

Guigui14460commented, May 5, 2021

Okay fine. It’s just that it seemed strange to me that the scores could change between versions with the same parameters. Thanks for answering my questions 👍

0reactions

glemaitrecommented, Dec 17, 2021

I am not sure that we will be able to track down the changes.

I sometimes may fix some bug that can alter the random generator. It could also be linked to a change of the NumPy version used.