question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Processes stopped when passing large objects to function to be parallelized

See original GitHub issue

Problem:

Apply a NLP Deep Learning model for Text Geneartion over the rows of a Pandas Series. The function call is:

out = text_column.parallel_apply(lambda x: generate_text(args, model, tokenizer, x))

where args, tokenizer are light objects but model is a heavy object, storing a Pytorch model which weighs more than 6GB on secondary memory and takes up ~12GB RAM when running it.

I have been doing some tests and the problem arises only when I pass the heavy model to the function (even without effectively running it inside the function), so it seems that the problem is passing an object as argument that takes up a lot of memory. (Maybe related with the Sharing Memory strategy for parallel computing.)

After running the parallel_apply the output I get is:

INFO: Pandarallel will run on 8 workers.
INFO: Pandarallel will use standard multiprocessing data tranfer (pipe) to transfer data between the main process and workers.
   0.00%                                          |        0 /      552 |
   0.00%                                          |        0 /      552 |
   0.00%                                          |        0 /      551 |
   0.00%                                          |        0 /      551 |
   0.00%                                          |        0 /      551 |
   0.00%                                          |        0 /      551 |
   0.00%                                          |        0 /      551 |
   0.00%                                          |        0 /      551 |

And it gets stuck there forever. Indeed, there are two processed spawned and both are stopped:

ablanco+  85448  0.0  4.9 17900532 12936684 pts/27 Sl 14:41   0:00 python3 text_generation.py --input_file input.csv --model_type gpt2  --output_file out.csv --no_cuda --n_cpu 8
ablanco+  85229 21.4 21.6 61774336 57023740 pts/27 Sl 14:39   2:26 python3 text_generation.py --input_file input.csv --model_type gpt2  --output_file out.csv --no_cuda --n_cpu 8

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
biebiepcommented, Feb 9, 2020

Currently fixed by upgrading python to 3.7.6 from 3.7.4, apparently the problem was with pickle.

1reaction
Lolologistcommented, Jan 28, 2021

For those who seek why a single process is running indefinitely with no results: I was on 3.6.4 and upgrading to 3.7.6 fixed the issue. Still no luck with progress bars, sadly.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Methods for passing large objects in python multiprocessing
I'm doing something like this: from multiprocessing import Process, Queue def func(queue): # do stuff to build up sub_dict ...
Read more >
multiprocessing — Process-based parallelism — Python 3.11 ...
A prime example of this is the Pool object which offers a convenient means of parallelizing the execution of a function across multiple...
Read more >
Parallelization caveats in R #1: performance issues
To make sure your processes actually stop, there is something you can do before you start your calculations. R provides a function called...
Read more >
Programming in Ray: Tips for first-time users - RISE Lab
Avoid passing same object repeatedly to remote tasks. When we pass a large object as an argument to a remote function, Ray calls...
Read more >
Parallel Processing in R
Parallel processing (in the extreme) means that all the f# processes start simultaneously and run to completion on their own. If we have...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found