question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Differences in scores between two different versions

See original GitHub issue

I was working on a private dataset. So I used Scikit-Learn v0.24.2, the stable version. Then, I had to install pycaret which uses Scikit-Learn version 0.23.2.

So I decided to run again my notebook where my models were trained (RandomForestClassifier, GradientBoostingClassifier, and other classifiers). But I found a difference in scores. There isn’t much difference, but it did make me wonder if it was a bug between versions or just due to the different algorithms used. Logically, if the parameters and the random_state do not change, this should not happen (at least, when I do this test on the same version, there is no change for some models).

First, I thought I had changed the parameters of my models in the meantime, or the random_state. Unfortunately, there was no change. Secondly, I looked to see if the initial parameters had been changed but my search was unsuccessful.

I accept that logistic regression or stochastic gradient descent can change, but the random forest or the k nearest neighbors, I have trouble seeing why they change. So I come to you, to know if this is a bug, or a function or a class at the base of the estimators that could have been changed impacting the models, or just me that misunderstood the different models.

The code is long, so I put it in a repo created for this purpose. To reproduce this, you need to run the notebooks through 2 different versions of Scikit-Learn:

So you need a virtual environment for each notebook (try to use venv instead of pipenv).

Global Python version: 3.8 OS: Windows 10 64 bits on Builds 18363.1440 and 19041 (my work PC and my personnal PC).

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
Guigui14460commented, May 5, 2021

Okay fine. It’s just that it seemed strange to me that the scores could change between versions with the same parameters. Thanks for answering my questions 👍

0reactions
glemaitrecommented, Dec 17, 2021

I am not sure that we will be able to track down the changes.

I sometimes may fix some bug that can alter the random generator. It could also be linked to a change of the NumPy version used.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Visualizing Data: Raw vs Difference Scores - MeasuringU
When comparing difference scores to raw data, it's important to remember that difference scores are presented on different scales than the raw ...
Read more >
Comparing Z-scores | Statistics and Probability - Study.com
Z-scores help us compare values across multiple data sets by describing each value in the context of how much variation there is in...
Read more >
The Practice of Comparing Scores on Different Tests - ETS
Scores are the end products of assessment processes. Scores are used for admissions, placement, diagnosis and a variety of other purposes.
Read more >
Introducing equating methodologies to compare test scores ...
In lay terms, equating consists of establishing equivalence scores on two different scales that measure the same construct.
Read more >
Significant difference between difference scores of 2 groups ...
There are two issues: Your data are dependent (each person has two measures), and you are interested in difference scores of proportions /...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found