Support numpy.random.Generator and/or BitGenerator for random number generation
See original GitHub issueDescribe the workflow you want to enable
I’d like to use a Generator
or BitGenerator
with scikit-learn where I’d otherwise use RandomState
or a seed
int.
For example:
import numpy as np
bit_generator = np.random.PCG64(seed=0)
generator = np.random.Generator(bit_generator)
and then use this for random_state=
in scikit-learn:
from sklearn.datasets import make_classification
from sklearn.model_selection import ShuffleSplit
from sklearn.svm import LinearSVC
X, y = make_classification(random_state=generator) # or my bit_generator here
classifier = LinearSVC(random_state=generator)
cv = ShuffleSplit(random_state=generator)
This fails because these methods expect a RandomState
object or int
seed value. The specific trigger is check_random_state(random_state)
.
Describe your proposed solution
This would require:
- changing code to allow
Generator
orBitGenerator
as acceptable values forrandom_state=..
in every function and class constructor that acceptsrandom_state
. - change
check_random_state()
to allowGenerator
and/orBitGenerator
objects. - adding tests for using
Generator
orBitGenerator
with classes or functions that consumerandom_state
(similar toseed
int orRandomState
objects already) - change any internal code that assumes
RandomState
methods that aren’t available withGenerator
(e.g.rand
,randn
, see ) - maybe switch to using
Generator
instead ofRandomState
by default, when seed int is given
Describe alternatives you’ve considered, if relevant
The scope could include either or both of BitGenerator
or Generator
.
It might be easiest to allow only BitGenerator
, and not Generator
.
- This allows flexibility.
- Users have control over seed and PRNG algorithm.
- This is easier to implement (can be treated just like a
seed
int value).BitGenerator
can be given toRandomState
, and I think it then produces the same values asGenerator
.
Additional context
NumPy v1.17 added the numpy.random.Generator
(docs) interface for random number generation.
Overview:
Generator
is similar toRandomState
, but enables different PRNG algorithmsBitGenerator
(docs) encapsulates the PRNG and seed value, e.g.PCG64(seed=0)
RandomState
“is considered frozen” and uses “the slow Mersenne Twister” by default (docs)RandomState
can work with non-MersenneBitGenerator
objects- More info in NEP-19, the design document from NumPy.
The API for Generator
and BitGenerator
looks like:
from numpy import random
bit_generator = random.PCG64(seed=0) # PCG64 is a BitGenerator subclass
generator = random.Generator(bit_generator)
generator.uniform(...) # API is similar to RandomState
# there's also this, for making a PCG64-backed Generator
generator = random.default_rng(seed=0)
Issue Analytics
- State:
- Created 3 years ago
- Reactions:8
- Comments:16 (14 by maintainers)
Top Results From Across the Web
Random Generator — NumPy v1.24 Manual
Generator exposes a number of methods for generating random numbers drawn from a ... It uses Mersenne Twister, and this bit generator can...
Read more >cupy.random.BitGenerator — CuPy 11.3.0 documentation
Base Class for generic BitGenerators, which provide a stream of random bits based on different algorithms. Must be overridden. Parameters. seed (int, array_like ......
Read more >Random Number Generator Using Numpy Tutorial - DataCamp
In this example, we'll generate lots of random numbers between zero and one, and then plot a histogram of the results. If the...
Read more >Random Generator — NumPy v1.17 Manual
Container for the BitGenerators. Generator exposes a number of methods for generating random numbers drawn from a variety of probability ...
Read more >NumPy - Advancing Random BitGenerator - Stack Overflow
p2 generating rows from 501-1000, with the BitGenerator advanced by 500. Python standard random.Random() object does not allow to select the ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
FYI,
rand
,randn
, andrandom_sample
should all be considered disrecommended variants and aliases, and you should use the preferred methods onRandomState
that also exist onGenerator
:random()
andstandard_normal()
.For
randint()
, we use a utility function in scipy to wrap around either aRandomState
orGenerator
. Part of the improvement ofGenerator.integers()
was the semantics of processing the arguments, so there wasn’t a great way to retainrandint()
. The wrapper function uses those new arguments, so there is a tiny bit of thought to be appleid when making that replacement.I don’t think you use any of the others.
NEP 29 says:
i.e. there should be no obligation to support NumPy 1.16 in any major or minor release after Jan 13, 2021.
The main thing is that, if you bump the minimum version of NumPy to 1.17, then you can write things like:
if isinstance(random_state, np.random.BitGenerator)
without first checking ifnp.random.BitGenerator
exists.Also, since NumPy 1.17, the
RandomState
init has permitted aBitGenerator
as an input. So the simplest workaround for handlingGenerator
inputs to these functions is to extract the associated bit generator and re-wrap it as aRandomState
:np.random.RandomState(rng.bit_generator)
. This will use the new bit generators along with the legacy API and non-uniform algorithms. However, doing this will not give you many of the main advantages of the new random API. A better long-term solution might be to extract the bit generator from the RandomState input and re-wrap it as a Generator and use the newer, faster, cleaner, more powerful API everywhere.As for https://github.com/scikit-learn/scikit-learn/issues/16988#issuecomment-622364681 specifically, I think that most of those are issues of methods having given better names in the new API than the legacy one (see https://numpy.org/doc/stable/reference/random/index.html#quick-start). I don’t think there’s any guarantee that two methods with the same name will give the same random number streams, since the many of the algorithms used to convert the random bits to random numbers have been improved, e.g. more efficient implementation, or even that they have exactly the same arguments. The exception is the state related ones, which is even more complicated (e.g. https://numpy.org/doc/stable/reference/random/generated/numpy.random.RandomState.get_state.html).
I hope this is clear; I know it’s brief and the topic is complex.