question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Remove randomness dependency on PYTHONHASHSEED

See original GitHub issue

We’re using faker extensively in our test suite with factory-boy to generate dummy data. I love it! To help with repeatability in tests, I developed the pytest plugin pytest-randomly to allow re-using the random seed between test runs, so if they fail we can reuse the seed to get back the same random data from faker and debug the test.

Unfortunately we’re moving from Python 2 to 3 and this includes activating PYTHONHASHSEED by default. We’re currently trying to add it on Python 2 so that we’re ready.

Since PYTHONHASHSEED means that dictionary iteration order is random, it means that anything that depends on this breaks. This is generally good as if you really depend on the iteration order, you should use a different data structure.

Unfortunately the way Faker generates some random data (specifically I’ve found random_element when using a dict) is affected by dictionary iteration order. This means that fixing the random seed for the random module isn’t enough, and one would need to fix PYTHONHASHSEED too. What a pain! It’s also quite hard to do since there isn’t a way inside a python process to query what the value of PYTHONHASHSEED is, unless it was explicitly set as an environment variable as opposed to set to 'random'.

I think it would be better if Faker just didn’t depend on dictionary iteration order, so its random data is also repeatable. This should be possible in most cases by swapping dict for OrderedDict.

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:11 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
fcurellacommented, Jan 19, 2022

@timschwab Thanks for pointing that out. Being able to use dict instead of OrderedDict sure sounds nice! Feel free to submit your PR!

0reactions
timschwabcommented, Jan 19, 2022

Today I was informed by the error message that the random_element provider does not accept a dict as an argument but instead requires an OrderedDict. That message sent me here, which is why I’m deciding to resurrect a long dead issue to ask if this constraint is still needed. As of Python 3.7, iteration over a dict is guaranteed to be in insertion order. This makes the two objects nearly identical. I would be happy to submit a PR to remove this restriction if you guys agree.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Disable hash randomization from within python program
Randomization can be disabled by setting the environment variable PYTHONHASHSEED : PYTHONHASHSEED. If this variable is not set or set to ...
Read more >
Reproducibility — GerryChain documentation - Read the Docs
This makes sure that all randomness is used after the seed is set. ... The way to accomplish this is to set the...
Read more >
Properly Setting the Random Seed in ML Experiments. Not as ...
Algorithms themselves — some models, such as random forest, are naturally dependent on randomness and others use randomness as a way of ...
Read more >
How to Get Reproducible Results with Keras
Solution #2: Seed the Random Number Generator​​ Random number generators require a seed to kick off the process, and it is common to...
Read more >
Issue 25420: "import random" blocks on entropy collection on ...
When imported, the random module creates and seeds an implicit ... of zeroes and remove randomness entirely to get reproducible builds).
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found