ENH: random_state using scipy.stats.qmc engines
See original GitHub issueThis is a follow up of discussions in #10844.
Describe the solution you’d like
All sampling methods in scipy.stats are using random_state which is a np.random.Generator (for new code). But this numpy generator is not aware of dimensions. Also, the new scipy.stats.qmc allow to generate samples efficiently in n-dimensions. It would be nice to bridge the gap between the two.
Currently there is a qmc.MultivariateNormalQMC. Instead of duplicating this for other distributions, a solution could be to make the QMC engines inherit from np.random.BitGenerator. We could then use the new QMC engines with all the existing distributions.
Describe alternatives you’ve considered
- Solution using a
BitGenerator: Seemingly, it must be done in Cython. Otherwise I saw that there was a wrapper aroundBitGenerator, but this is not available innumpy… https://bashtage.github.io/randomgen/bit_generators/userbitgenerator.html from @bashtage. This is working but the underlyingnumpycode is not aware of dimensions (https://github.com/numpy/numpy/blob/e4feb7027e397925d220a10dd58b581b87ca1fec/numpy/random/_generator.pyx#L3562-L3568).
from numpy.random import Generator
from randomgen import UserBitGenerator
class SobolNP:
def __init__(self, state):
self._next_64 = None
self.engine = Sobol(d=1, scramble=False, seed=state)
def random_raw(self):
"""Generate the next "raw" value, which is 64 bits"""
return int(self.engine.random() * 2**64) # although Sobol uses 30 bits...
@property
def next_64(self):
def _next_64(void_p):
return self.random_raw()
self._next_64 = _next_64
return _next_64
# examples
prng = SobolNP(1234)
sobol_bit_generator = UserBitGenerator(prng.next_64, 64)
gen = Generator(sobol_bit_generator)
gen.random(8)
- Another solution would be to mock
np.random.Generator: I am using__getattr__to mock calls to the distributions. So calls likerandom_state.uniform(...). Seems to be working and n-dimensions is ok too.
from numpy.random import Generator
import scipy.stats as stats
from functools import partial
class ScipyGenerator:
@property
def __class__(self):
return Generator
def __init__(self, d):
self.qrng = stats.qmc.Sobol(d=d, scramble=False)
def rvs(self, *args, dist, **kwargs):
args = list(args)
size = kwargs.pop('size', None)
if size is None:
size = args.pop(0)
sample = self.qrng.random(size)
return dist(*args, **kwargs).ppf(sample)
def __getattr__(self, attr):
try:
dist = getattr(stats, attr)
return partial(self.rvs, dist=dist)
except AttributeError as err_np:
raise err_np
# examples
random_state = ScipyGenerator(d=2)
isinstance(random_state, Generator)
random_state.uniform(8)
random_state.gamma(2, size=8)
Issue Analytics
- State:
- Created 3 years ago
- Comments:24 (24 by maintainers)
Top Results From Across the Web
scipy.stats.qmc.QMCEngine — SciPy v1.9.3 Manual
After subclassing QMCEngine to define the sampling strategy we want to use, we can create an instance to sample from. >>> engine =...
Read more >[Question] Desire for qMC library in scipy? #9695 - GitHub
I have a fast Cython implementation of a Sobol low-discrepancy quasi-random number generator using Owen scrambling.
Read more >SciPy 1.7.0 Release Notes
This new module provides Quasi-Monte Carlo (QMC) generators and associated helper functions. It provides a generic class scipy.stats.qmc.QMCEngine which defines ...
Read more >SciPy: doc/release/1.7.0-notes.rst - Fossies
It provides a generic class scipy.stats.qmc.QMCEngine which defines a QMC engine/sampler. An engine is state aware: it can be continued, advanced and reset....
Read more >statsmodels.tools.rng_qrng.check_random_state
array_like[ints] , a new ; Generator instance is used, seeded with seed . If seed is already a ; Generator , ; RandomState...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

It seems that we have two big ideas for stats maintenance these days:
After a year of pretty intense work on
scipy.stats, I’ve become interested in both, but I’ve prioritized #2. The resampling method PRs (bootstrap, permutation test, and Monte Carlo tests) and gh-13312 have actually all been part of that effort. Besides applying the_axis_nan_policydecorator to existing functions and providing the resampling methods to expand the functionality of the statistics and tests we already have, my plan has been to create a class that makes writing a typical hypothesis test as simple as implementing a 1D statistic (and, if the null hypothesis is not that of independence, possibly defining the distribution of the statistic under the null hypothesis). All this other functionality (nan_policy, vectorization, one- and two-sided p-values, confidence intervals, etc.) can be inherited - or overridden if desired. The goal is for the interfaces and capabilities of most tests to be pretty consistent without requiring contributors to re-invent the wheel in every PR.rv_continuousandrv_discreteare not perfect, but think what it would be like if everyone had to implement distributions from scratch! A lot of distributions would be missing a lot of methods, there would be different names for the same methods and parameters, there would be varying behaviors in case of invalid inputs (maybe output NaN, maybe raise this or that error), and who knows what support for vectorization would look like. But this is exactly what we have in the case of the hypothesis tests and correlation functions now! I think that’s why I have prioritized it.There has also been some effort toward #1. @tirthasheshpatel and I have been talking in recent PRs about overhauling the test suite of distributions so that we can find all the obvious bugs where distribution methods are not living up to their public signatures.
I don’t know that we should work on all aspects of these projects simultaneously. So, in the case of working on overhaul of
rv_continuous/discrete, I think it would be better if this were to wait a bit. I really would like to be a part of it, but I don’t think I can do it in parallel with all the rest. Also, I think it would help to have a more thorough test suite to help us characterize the shortcomings ofrv_continuousandrv_discretebefore we rewrite the distribution classes (or factory functions). And personally, I’d prefer to get a little further along toward the other stuff - item 2 - before digging deeply into those test suite improvements.So maybe I’d suggest this order of high-level maintenance operations: 1a. Get gh-14651 rolling. (It will take a long time to complete but once it gets rolling, each new PR will be pretty quick and easy, as has been the case with the
alternativeeffort in gh-12506. I think we’ll know when we get there, and at that point, we can move on.) 1b. A base class (or factory function) for hypothesis tests, with the first example being the z-test (gh-13662). 2a. Overhaul the test suite of the distributions 2b. Look into what comes afterrv_continuousandrv_discreteI like some variety, but I’m not really efficient when I’m bouncing between dozen of PRs, waiting a few months at a time between updates and having to re-learn everything when I come back to it. I imagine that if a few of us were able to tag-team as authors and reviewers toward a common goal, we could get a lot done pretty quickly. What do you think?
This action plan sounds reasonable 👍 The overhaul of the distribution is a massive undertaking for sure and we would need to be sure to have a few maintainers on board to first avoid late discussions and second to do the hard work. This should not be a 1-2 man only project.
As usual, feel free to ping me if you feel I could help 😃