question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

parallel_apply never starts processing

See original GitHub issue

ISSUE: Progress on the parallel_apply never starts going up.

I am trying to use parallel_apply to populate new columns on a data frame. This takes about 50 minutes with normal apply, but every column is independent so it should be easily parallelizable.

I am using the following to initialize:

pandarallel.initialize(nb_workers=8, progress_bar=True, use_memory_fs=False)

OUTPUT:

INFO: Pandarallel will run on 8 workers.
INFO: Pandarallel will use standard multiprocessing data transfer (pipe) to transfer data between the main process and workers.

and this is my parallel_apply call:

allowed_types_list = ['...', '...', ..., '...']
data["allowed"] = data["type"].apply(lambda x: 1 if x in allowed_types_list else 0)

The shape of my dataframe is: (4717892, 8)

ISSUE: Progress on the parallel_apply never starts going up.

I tried similarly on a different function that takes around 5 second on apply, and same thing happens. I tried it on my local computer (running MacOS with an i9, using pipe for data transfer) and on Google Colab (here I had 4 cores, using memory file system for data transfer). Same behavior on both.

Am I missing something?

As a side note, is it possible to get the progress bars working on Google Colab?

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:10

github_iconTop GitHub Comments

0reactions
yangyxtcommented, Dec 8, 2022

Same issue here using pandarallel==1.6.1, python 3.9.5 pandas 1.4.2. However I encounter this by finding out the cputime of the computation node stop increasing. And I set progress_bar=True, use_memory_fs=False.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Parallel apply is not faster than regular apply pyhon
I would be using the python module called threading which runs the process on the same cpu but different threads.
Read more >
Make your Pandas apply functions faster using Parallel ...
Make your Pandas apply functions faster using Parallel Processing ... Let me first start with defining the function I want to use to...
Read more >
Parallel Vectorized Operations | R-bloggers
Essentially, R starts up n number of instances and sends subsets of the original data to be processed in those instances using its...
Read more >
Parallel Replication - MariaDB Knowledge Base
The documentation process is ongoing. ... Optimistic mode of in-order parallel replication provides a lot of opportunities for parallel apply on the replica ......
Read more >
4 Managing the Members of a Broker Configuration
4.6.2 Managing Parallel Apply with Redo Apply ... The former primary database is never automatically reinstated if a fast-start failover occurred because a ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found