question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Random_state produces different results on different operating systems

See original GitHub issue

Issue

The random_state parameter produces deterministic results on a specific OS, but does not produce the same results on different OSes. Here are some examples for umap-learn, run with the following code. I used the example from the README here, as well as Scikit’s check_random_state as a control (all Scikit results are the same).

The results are also seem to be dependent on the version of Numba that is installed.

# UMAP example with random state
import umap
from sklearn.datasets import load_digits

digits = load_digits()

embedding = umap.UMAP(
  n_neighbors=5,
  min_dist=0.3,
  metric='correlation',
  random_state=2018,
).fit_transform(digits.data)
embedding

# Scikit check random state
from sklearn.utils import check_random_state
random_state = check_random_state(2018)
random_state.rand(4)

Example Results

Machine Architecture Python Version umap-learn Version numba Version UMAP Result
Macbook Pro #1 Darwin C02P141DG3QD 16.7.0 Darwin Kernel Version 16.7.0: Thu Jun 21 20:07:39 PDT 2018; root:xnu-3789.73.14~1/RELEASE_X86_64 x86_64 Python 3.7.0 0.3.2 0.39.0 array([[16.42446  , -2.1266642],       [ 7.231049 , -1.5276358],       [-1.5864906, -5.1226635],       …,       [ 6.094945 ,  1.2291753],       [ 1.3193432,  5.4169164],       [ 5.5729628,  2.2857437]], dtype=float32)
Macbook Pro #1 Darwin C02P141DG3QD 16.7.0 Darwin Kernel Version 16.7.0: Thu Jun 21 20:07:39 PDT 2018; root:xnu-3789.73.14~1/RELEASE_X86_64 x86_64 Python 3.7.0 0.3.5 0.40.1 array([[32.471622,  8.842674],       [16.400652, 13.036578],       [ 9.181449,  3.948576],       …,       [19.216055, 12.42009 ],       [ 6.522507, 14.285691],       [19.517092, 11.733169]], dtype=float32)
Macbook Pro #2 Darwin C02VN4T7HV2L 17.7.0 Darwin Kernel Version 17.7.0: Thu Jun 21 22:53:14 PDT 2018; root:xnu-4570.71.2~1/RELEASE_X86_64 x86_64 Python 3.7.0 0.3.2 0.40.0 array([[16.42446  , -2.1266642],        [ 7.231049 , -1.5276358],        [-1.5864906, -5.1226635],        …,        [ 6.094945 ,  1.2291753],        [ 1.3193432,  5.4169164],        [ 5.5729628,  2.2857437]], dtype=float32)
Macbook Pro #2 Darwin C02VN4T7HV2L 17.7.0 Darwin Kernel Version 17.7.0: Thu Jun 21 22:53:14 PDT 2018; root:xnu-4570.71.2~1/RELEASE_X86_64 x86_64 Python 3.7.0 0.3.5 0.40.1 array([[32.471622,  8.842674],        [16.400652, 13.036578],        [ 9.181449,  3.948576],        …,        [19.216055, 12.42009 ],        [ 6.522507, 14.285691],        [19.517092, 11.733169]], dtype=float32)
Debian Docker Linux 389088ec7b25 4.9.93-linuxkit-aufs #1 SMP Wed Jun 6 16:55:56 UTC 2018 x86_64 GNU/Linux Python 3.5.3 0.3.5 0.40.1 array([[25.864304 ,  7.870304 ],        [16.924606 ,  7.9489594],        [ 7.4818945,  9.081071 ],        …,        [15.565144 , 10.721824 ],        [ 7.7764506, 14.354664 ],        [14.85415  , 11.515898 ]], dtype=float32)
Ubuntu Docker Linux 6a9a07ef70b7 4.9.93-linuxkit-aufs #1 SMP Wed Jun 6 16:55:56 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux Python 3.6.6 0.3.5 0.40.1 array([[25.864304 ,  7.870304 ],        [16.924606 ,  7.9489594],        [ 7.4818945,  9.081071 ],        …,        [15.565144 , 10.721824 ],        [ 7.7764506, 14.354664 ],        [14.85415  , 11.515898 ]], dtype=float32)
Ubuntu Docker Linux 6a9a07ef70b7 4.9.93-linuxkit-aufs #1 SMP Wed Jun 6 16:55:56 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux Python 3.7.0 0.3.5 0.40.1 array([[25.864304 ,  7.870304 ],        [16.924606 ,  7.9489594],        [ 7.4818945,  9.081071 ],        …,        [15.565144 , 10.721824 ],        [ 7.7764506, 14.354664 ],        [14.85415  , 11.515898 ]], dtype=float32)
Ubuntu Desktop Linux brick 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux Python 3.6.6 0.3.5 0.40.1 array([[25.864225 ,  7.8703256],        [16.92632  ,  7.943247 ],        [ 7.4819674,  9.081023 ],        …,        [15.570685 , 10.72381  ],        [ 7.776701 , 14.354493 ],        [14.864248 , 11.530873 ]], dtype=float32)

Issue Analytics

  • State:open
  • Created 5 years ago
  • Reactions:4
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

5reactions
huidongchencommented, Jun 1, 2020

I am running into the same issue (MacOS vs Linux).

3reactions
simonwmcommented, Jul 25, 2021

I have the same issue (Mac/Windows/WSL/Linux) - and maybe an idea how to solve it.

I could solve reproducibility issues in other libraries by seeding everything which is seedable from the outside in addition to supplying the random seed for the package itself: random.seed, numpy.random.seed. Here however this did not work.

There are two additional sources of randomness which I can think of and which are not (easily) fixable from the outside: instantiated numpy random generators (but I really think @lmcinnes took care of that if necessary) and the numba random number generators.

While they look identical to the top level numpy ones and are also seeded by numpy.random.seed just from within numba code, they are independent from the non-numba numpy random generators, and they are initialized at startup with entropy drawn from the operating system. https://numba.pydata.org/numba-doc/latest/reference/pysupported.html?highlight=numpy random#random

If that is really the reason, the fix is simple in principle: Just call numpy.random.seed also from within your numba-jitted code.

And if numba randomness is the only reason for not having the same result for serial and parallel runs, you might be able to figure out a scheme to specify a deterministic seed for every block of work - and use the same seeds independent of the number of threads, in particular also in the serial run.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why ML model produces different results despite ...
I have been running few ML models on same set of data for a binary classification problem with class proportion of 33:67. I...
Read more >
Sklearn different results with the same random_state across ...
It produces always deterministic results on the one computer (system) but when I switch to another computer, results are different.
Read more >
[solved] numpy.random state seems to repeat across multiple ...
When I call np.random.shuffle multiple times within an expt it will give novel outcomes consecutively. So randomization within a session works.
Read more >
Random_state after kernel restart - DQ Courses
Anyone knows why when using the same random_state in sklearn RandomForestClassifier, restarting the kernel would produce different results?
Read more >
Good practices with numpy random number generators
The implicit global RandomState behind the np.random.* convenience functions can cause problems, especially when threads or other forms of ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found