question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support numpy.random.Generator and/or BitGenerator for random number generation

See original GitHub issue

Describe the workflow you want to enable

I’d like to use a Generator or BitGenerator with scikit-learn where I’d otherwise use RandomState or a seed int.

For example:

import numpy as np

bit_generator = np.random.PCG64(seed=0)
generator = np.random.Generator(bit_generator)

and then use this for random_state= in scikit-learn:

from sklearn.datasets import make_classification
from sklearn.model_selection import ShuffleSplit
from sklearn.svm import LinearSVC

X, y = make_classification(random_state=generator)  # or my bit_generator here 
classifier = LinearSVC(random_state=generator)
cv = ShuffleSplit(random_state=generator)

This fails because these methods expect a RandomState object or int seed value. The specific trigger is check_random_state(random_state).

Describe your proposed solution

This would require:

  • changing code to allow Generator or BitGenerator as acceptable values for random_state=.. in every function and class constructor that accepts random_state.
  • change check_random_state() to allow Generator and/or BitGenerator objects.
  • adding tests for using Generator or BitGenerator with classes or functions that consume random_state (similar to seed int or RandomState objects already)
  • change any internal code that assumes RandomState methods that aren’t available with Generator (e.g. rand, randn, see )
  • maybe switch to using Generator instead of RandomState by default, when seed int is given

Describe alternatives you’ve considered, if relevant

The scope could include either or both of BitGenerator or Generator.

It might be easiest to allow only BitGenerator, and not Generator.

  • This allows flexibility.
    • Users have control over seed and PRNG algorithm.
  • This is easier to implement (can be treated just like a seed int value).
    • BitGenerator can be given to RandomState, and I think it then produces the same values as Generator.

Additional context

NumPy v1.17 added the numpy.random.Generator (docs) interface for random number generation.

Overview:

  • Generator is similar to RandomState, but enables different PRNG algorithms
  • BitGenerator (docs) encapsulates the PRNG and seed value, e.g. PCG64(seed=0)
  • RandomState “is considered frozen” and uses “the slow Mersenne Twister” by default (docs)
  • RandomState can work with non-Mersenne BitGenerator objects
  • More info in NEP-19, the design document from NumPy.

The API for Generator and BitGenerator looks like:

from numpy import random

bit_generator = random.PCG64(seed=0)  # PCG64 is a BitGenerator subclass
generator = random.Generator(bit_generator)

generator.uniform(...)  # API is similar to RandomState

# there's also this, for making a PCG64-backed Generator
generator = random.default_rng(seed=0)

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:8
  • Comments:16 (14 by maintainers)

github_iconTop GitHub Comments

4reactions
rkerncommented, Aug 4, 2021

FYI, rand, randn, and random_sample should all be considered disrecommended variants and aliases, and you should use the preferred methods on RandomState that also exist on Generator: random() and standard_normal().

For randint(), we use a utility function in scipy to wrap around either a RandomState or Generator. Part of the improvement of Generator.integers() was the semantics of processing the arguments, so there wasn’t a great way to retain randint(). The wrapper function uses those new arguments, so there is a tiny bit of thought to be appleid when making that replacement.

I don’t think you use any of the others.

3reactions
jamesmyattcommented, Apr 12, 2021

It just means that it should be OK to use the new numpy.random API now without adapters or defensive programming.

Would you mind elaborating on this? For now I don’t understand how NEP29 and the deprecation of numpy 1.16 changes anything to the concerns raised in #16988 (comment). It seems to me that these will be valid concerns for as long as we support RandomState.

NEP 29 says:

When a project releases a new major or minor version, we recommend that they support at least all minor versions of Python introduced and released in the prior 42 months from the anticipated release date with a minimum of 2 minor versions of Python, and all minor versions of NumPy released in the prior 24 months from the anticipated release date with a minimum of 3 minor versions of NumPy.

i.e. there should be no obligation to support NumPy 1.16 in any major or minor release after Jan 13, 2021.

The main thing is that, if you bump the minimum version of NumPy to 1.17, then you can write things like: if isinstance(random_state, np.random.BitGenerator) without first checking if np.random.BitGenerator exists.

Also, since NumPy 1.17, the RandomState init has permitted a BitGenerator as an input. So the simplest workaround for handling Generator inputs to these functions is to extract the associated bit generator and re-wrap it as a RandomState: np.random.RandomState(rng.bit_generator). This will use the new bit generators along with the legacy API and non-uniform algorithms. However, doing this will not give you many of the main advantages of the new random API. A better long-term solution might be to extract the bit generator from the RandomState input and re-wrap it as a Generator and use the newer, faster, cleaner, more powerful API everywhere.

As for https://github.com/scikit-learn/scikit-learn/issues/16988#issuecomment-622364681 specifically, I think that most of those are issues of methods having given better names in the new API than the legacy one (see https://numpy.org/doc/stable/reference/random/index.html#quick-start). I don’t think there’s any guarantee that two methods with the same name will give the same random number streams, since the many of the algorithms used to convert the random bits to random numbers have been improved, e.g. more efficient implementation, or even that they have exactly the same arguments. The exception is the state related ones, which is even more complicated (e.g. https://numpy.org/doc/stable/reference/random/generated/numpy.random.RandomState.get_state.html).

I hope this is clear; I know it’s brief and the topic is complex.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Random Generator — NumPy v1.24 Manual
Generator exposes a number of methods for generating random numbers drawn from a ... It uses Mersenne Twister, and this bit generator can...
Read more >
cupy.random.BitGenerator — CuPy 11.3.0 documentation
Base Class for generic BitGenerators, which provide a stream of random bits based on different algorithms. Must be overridden. Parameters. seed (int, array_like ......
Read more >
Random Number Generator Using Numpy Tutorial - DataCamp
In this example, we'll generate lots of random numbers between zero and one, and then plot a histogram of the results. If the...
Read more >
Random Generator — NumPy v1.17 Manual
Container for the BitGenerators. Generator exposes a number of methods for generating random numbers drawn from a variety of probability ...
Read more >
NumPy - Advancing Random BitGenerator - Stack Overflow
p2 generating rows from 501-1000, with the BitGenerator advanced by 500. Python standard random.Random() object does not allow to select the ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found