Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

parallel_apply never starts processing

See original GitHub issue

ISSUE: Progress on the parallel_apply never starts going up.

I am trying to use parallel_apply to populate new columns on a data frame. This takes about 50 minutes with normal apply, but every column is independent so it should be easily parallelizable.

I am using the following to initialize:

pandarallel.initialize(nb_workers=8, progress_bar=True, use_memory_fs=False)

OUTPUT:

INFO: Pandarallel will run on 8 workers.
INFO: Pandarallel will use standard multiprocessing data transfer (pipe) to transfer data between the main process and workers.

and this is my parallel_apply call:

allowed_types_list = ['...', '...', ..., '...']
data["allowed"] = data["type"].apply(lambda x: 1 if x in allowed_types_list else 0)

The shape of my dataframe is: (4717892, 8)

ISSUE: Progress on the parallel_apply never starts going up.

I tried similarly on a different function that takes around 5 second on apply, and same thing happens. I tried it on my local computer (running MacOS with an i9, using pipe for data transfer) and on Google Colab (here I had 4 cores, using memory file system for data transfer). Same behavior on both.

Am I missing something?

As a side note, is it possible to get the progress bars working on Google Colab?

Issue Analytics

State:
Created 3 years ago
Comments:10

Top GitHub Comments

1reaction

BrannonKingcommented, Dec 18, 2020

For your last question: https://stackoverflow.com/questions/64754814/pandarallel-widgets-dont-work-on-google-colab

0reactions

yangyxtcommented, Dec 8, 2022

Same issue here using pandarallel==1.6.1, python 3.9.5 pandas 1.4.2. However I encounter this by finding out the cputime of the computation node stop increasing. And I set progress_bar=True, use_memory_fs=False.

Top Results From Across the Web

Parallel apply is not faster than regular apply pyhon

I would be using the python module called threading which runs the process on the same cpu but different threads.

Make your Pandas apply functions faster using Parallel ...

Make your Pandas apply functions faster using Parallel Processing ... Let me first start with defining the function I want to use to...

Parallel Vectorized Operations | R-bloggers

Essentially, R starts up n number of instances and sends subsets of the original data to be processed in those instances using its...

Parallel Replication - MariaDB Knowledge Base

The documentation process is ongoing. ... Optimistic mode of in-order parallel replication provides a lot of opportunities for parallel apply on the replica ......

4 Managing the Members of a Broker Configuration

4.6.2 Managing Parallel Apply with Redo Apply ... The former primary database is never automatically reinstated if a fast-start failover occurred because a ......