parallel_apply never starts processing
See original GitHub issueISSUE: Progress on the parallel_apply
never starts going up.
I am trying to use parallel_apply
to populate new columns on a data frame.
This takes about 50 minutes with normal apply
, but every column is independent so it should be easily parallelizable.
I am using the following to initialize:
pandarallel.initialize(nb_workers=8, progress_bar=True, use_memory_fs=False)
OUTPUT:
INFO: Pandarallel will run on 8 workers.
INFO: Pandarallel will use standard multiprocessing data transfer (pipe) to transfer data between the main process and workers.
and this is my parallel_apply
call:
allowed_types_list = ['...', '...', ..., '...']
data["allowed"] = data["type"].apply(lambda x: 1 if x in allowed_types_list else 0)
The shape of my dataframe is: (4717892, 8)
ISSUE: Progress on the parallel_apply
never starts going up.
I tried similarly on a different function that takes around 5 second on apply
, and same thing happens.
I tried it on my local computer (running MacOS with an i9, using pipe for data transfer) and on Google Colab (here I had 4 cores, using memory file system for data transfer). Same behavior on both.
Am I missing something?
As a side note, is it possible to get the progress bars working on Google Colab?
Issue Analytics
- State:
- Created 3 years ago
- Comments:10
For your last question: https://stackoverflow.com/questions/64754814/pandarallel-widgets-dont-work-on-google-colab
Same issue here using pandarallel==1.6.1, python 3.9.5 pandas 1.4.2. However I encounter this by finding out the cputime of the computation node stop increasing. And I set progress_bar=True, use_memory_fs=False.