question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Randomness lost in multiprocessing

See original GitHub issue

Mimesis: 3.2.0 OS: macOS Mojave, v10.14.3 Python: 3.6.8

Situation

I’m creating a mass of fake data to test scaling — enough that I need to use multiprocessing for performance. However, doing so does not randomize the data among a set of workers.

The only solution I’ve found is to create a new mimesis object for each iteration, seeded with a count (see https://stackoverflow.com/a/29855961/1729586). This works, but performance takes a hit: in my case it takes ~7s to create 1,000 records when creating a seeded object for each iteration; if I only create the object once it takes ~1.5s.

Questions

  1. Is there anything I can do improve performance?
  2. Can a change be made to the internals that would work with multiprocessing?
  3. Would creating one object and updating the seed be feasible?

Simplified Example

import numpy as np

from argparse import ArgumentParser
from mimesis import Generic
from multiprocessing import Pool

parser = ArgumentParser()
parser.add_argument('names', type=int, help='Number of names to generate')

def get_fake(i):
  # To see the issue, remove the "seed" argument below.
  return Generic('en', seed=np.random.RandomState(i))

def generate_name(i):
  fake = get_fake(i)
  return fake.person.full_name()

def main():
  args = parser.parse_args()
  with Pool() as p:
    for name in p.imap(generate_name, range(args.names)):
      print(name)

if __name__ == "__main__":
  main()

With the seed in place, the names appear randomly; if the seed is removed, the output looks something like this:

Tyson Dominguez
Tyson Dominguez
Tyson Dominguez
Tyson Dominguez
Tyson Dominguez
Tyson Dominguez
Tyson Dominguez
Tyson Dominguez
Tyson Dominguez
Tyson Dominguez
Tyson Dominguez
Tyson Dominguez
Vanesa O'brien
Vanesa O'brien
Vanesa O'brien
Vanesa O'brien
Vanesa O'brien
Vanesa O'brien
Vanesa O'brien
Vanesa O'brien

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
lk-geimfaricommented, Jul 24, 2019

@epicyclist I understand. Anyway, I hope I helped you.

1reaction
chris-canipecommented, Jul 24, 2019

@lk-geimfari, my apologies for the confusion: the times are in reference to my actual, much more complex code — not the simplified example I posted.

Read more comments on GitHub >

github_iconTop Results From Across the Web

List values lost in Python Multiprocessing - Stack Overflow
For parallelisation I use the Manager and Process objects from the Multiprocessing module. In the minimal example below there is only one ...
Read more >
7 Multiprocessing Pool Common Errors in Python
Common Errors When Using Multiprocessing Pool; Error 1: Forgetting ... TypeError: task() missing 1 required positional argument: 'value' ...
Read more >
multiprocessing — Process-based parallelism — Python 3.11 ...
When multiprocessing is initialized the main process is assigned a random ... enqueued data to the underlying pipe, and you don't care about...
Read more >
Exploiting Multiprocessing and Multithreading in Python as a ...
Since processes are stored in the RAM they are lost once the system is turned off. Also, only one process can be executed...
Read more >
Handling Hang in Python Multiprocessing - Sefik Ilkin Serengil
I had often hang and deadlock problems when I use its multiprocessing module. This might be the worst case for a production application ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found