Randomness lost in multiprocessing
See original GitHub issueMimesis: 3.2.0 OS: macOS Mojave, v10.14.3 Python: 3.6.8
Situation
I’m creating a mass of fake data to test scaling — enough that I need to use multiprocessing for performance. However, doing so does not randomize the data among a set of workers.
The only solution I’ve found is to create a new mimesis object for each iteration, seeded with a count (see https://stackoverflow.com/a/29855961/1729586). This works, but performance takes a hit: in my case it takes ~7s to create 1,000 records when creating a seeded object for each iteration; if I only create the object once it takes ~1.5s.
Questions
- Is there anything I can do improve performance?
- Can a change be made to the internals that would work with multiprocessing?
- Would creating one object and updating the seed be feasible?
Simplified Example
import numpy as np
from argparse import ArgumentParser
from mimesis import Generic
from multiprocessing import Pool
parser = ArgumentParser()
parser.add_argument('names', type=int, help='Number of names to generate')
def get_fake(i):
# To see the issue, remove the "seed" argument below.
return Generic('en', seed=np.random.RandomState(i))
def generate_name(i):
fake = get_fake(i)
return fake.person.full_name()
def main():
args = parser.parse_args()
with Pool() as p:
for name in p.imap(generate_name, range(args.names)):
print(name)
if __name__ == "__main__":
main()
With the seed in place, the names appear randomly; if the seed is removed, the output looks something like this:
Tyson Dominguez
Tyson Dominguez
Tyson Dominguez
Tyson Dominguez
Tyson Dominguez
Tyson Dominguez
Tyson Dominguez
Tyson Dominguez
Tyson Dominguez
Tyson Dominguez
Tyson Dominguez
Tyson Dominguez
Vanesa O'brien
Vanesa O'brien
Vanesa O'brien
Vanesa O'brien
Vanesa O'brien
Vanesa O'brien
Vanesa O'brien
Vanesa O'brien
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
List values lost in Python Multiprocessing - Stack Overflow
For parallelisation I use the Manager and Process objects from the Multiprocessing module. In the minimal example below there is only one ...
Read more >7 Multiprocessing Pool Common Errors in Python
Common Errors When Using Multiprocessing Pool; Error 1: Forgetting ... TypeError: task() missing 1 required positional argument: 'value' ...
Read more >multiprocessing — Process-based parallelism — Python 3.11 ...
When multiprocessing is initialized the main process is assigned a random ... enqueued data to the underlying pipe, and you don't care about...
Read more >Exploiting Multiprocessing and Multithreading in Python as a ...
Since processes are stored in the RAM they are lost once the system is turned off. Also, only one process can be executed...
Read more >Handling Hang in Python Multiprocessing - Sefik Ilkin Serengil
I had often hang and deadlock problems when I use its multiprocessing module. This might be the worst case for a production application ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@epicyclist I understand. Anyway, I hope I helped you.
@lk-geimfari, my apologies for the confusion: the times are in reference to my actual, much more complex code — not the simplified example I posted.