Remove randomness dependency on PYTHONHASHSEED
See original GitHub issueWe’re using faker
extensively in our test suite with factory-boy
to generate dummy data. I love it! To help with repeatability in tests, I developed the pytest plugin pytest-randomly
to allow re-using the random
seed between test runs, so if they fail we can reuse the seed to get back the same random data from faker
and debug the test.
Unfortunately we’re moving from Python 2 to 3 and this includes activating PYTHONHASHSEED
by default. We’re currently trying to add it on Python 2 so that we’re ready.
Since PYTHONHASHSEED
means that dictionary iteration order is random, it means that anything that depends on this breaks. This is generally good as if you really depend on the iteration order, you should use a different data structure.
Unfortunately the way Faker generates some random data (specifically I’ve found random_element
when using a dict
) is affected by dictionary iteration order. This means that fixing the random seed for the random
module isn’t enough, and one would need to fix PYTHONHASHSEED
too. What a pain! It’s also quite hard to do since there isn’t a way inside a python process to query what the value of PYTHONHASHSEED
is, unless it was explicitly set as an environment variable as opposed to set to 'random'
.
I think it would be better if Faker just didn’t depend on dictionary iteration order, so its random data is also repeatable. This should be possible in most cases by swapping dict
for OrderedDict
.
Issue Analytics
- State:
- Created 7 years ago
- Comments:11 (4 by maintainers)
Top GitHub Comments
@timschwab Thanks for pointing that out. Being able to use
dict
instead ofOrderedDict
sure sounds nice! Feel free to submit your PR!Today I was informed by the error message that the
random_element
provider does not accept a dict as an argument but instead requires an OrderedDict. That message sent me here, which is why I’m deciding to resurrect a long dead issue to ask if this constraint is still needed. As of Python 3.7, iteration over a dict is guaranteed to be in insertion order. This makes the two objects nearly identical. I would be happy to submit a PR to remove this restriction if you guys agree.