Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Processes stopped when passing large objects to function to be parallelized

See original GitHub issue

Problem:

Apply a NLP Deep Learning model for Text Geneartion over the rows of a Pandas Series. The function call is:

out = text_column.parallel_apply(lambda x: generate_text(args, model, tokenizer, x))

where args, tokenizer are light objects but model is a heavy object, storing a Pytorch model which weighs more than 6GB on secondary memory and takes up ~12GB RAM when running it.

I have been doing some tests and the problem arises only when I pass the heavy model to the function (even without effectively running it inside the function), so it seems that the problem is passing an object as argument that takes up a lot of memory. (Maybe related with the Sharing Memory strategy for parallel computing.)

After running the parallel_apply the output I get is:

INFO: Pandarallel will run on 8 workers.
INFO: Pandarallel will use standard multiprocessing data tranfer (pipe) to transfer data between the main process and workers.
   0.00%                                          |        0 /      552 |
   0.00%                                          |        0 /      552 |
   0.00%                                          |        0 /      551 |
   0.00%                                          |        0 /      551 |
   0.00%                                          |        0 /      551 |
   0.00%                                          |        0 /      551 |
   0.00%                                          |        0 /      551 |
   0.00%                                          |        0 /      551 |

And it gets stuck there forever. Indeed, there are two processed spawned and both are stopped:

ablanco+  85448  0.0  4.9 17900532 12936684 pts/27 Sl 14:41   0:00 python3 text_generation.py --input_file input.csv --model_type gpt2  --output_file out.csv --no_cuda --n_cpu 8
ablanco+  85229 21.4 21.6 61774336 57023740 pts/27 Sl 14:39   2:26 python3 text_generation.py --input_file input.csv --model_type gpt2  --output_file out.csv --no_cuda --n_cpu 8

Issue Analytics

State:
Created 4 years ago
Comments:6 (1 by maintainers)

Top GitHub Comments

2reactions

biebiepcommented, Feb 9, 2020

Currently fixed by upgrading python to 3.7.6 from 3.7.4, apparently the problem was with pickle.

1reaction

Lolologistcommented, Jan 28, 2021

For those who seek why a single process is running indefinitely with no results: I was on 3.6.4 and upgrading to 3.7.6 fixed the issue. Still no luck with progress bars, sadly.