question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Setting progress_bar=True freezes execution for parallel_apply before reaching 1% completion on all CPU's

See original GitHub issue

When progress_bar=True, I noticed that the execution of my parallel_apply task stopped right before all parallel processes reached 1% progress mark. Here are some further details of what I was encountering -

  • I turned on logging with DEBUG messages, but no messages were displayed when the execution stopped. There were no error messages either. The dataframe rows simply stopped processing further and the process seemed to be frozen.
  • I have two CPU’s. It seems that the progress bar only updates in 1% increments. One of the progress bars reaches 1% mark, but when the number of processed rows reaches the 2% mark (which I assume is associated with the second progress bar updating to 1% as well), that’s when the process froze.
  • The process runs fine with progress_bar=False.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:12
  • Comments:22 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
chris-forbescommented, Feb 23, 2021

Similar issue and i’m only working on about 12k rows. It seems to get to about 300 completed items on each core then all of the forked processes just seem to die - almost like it’s trying to create new threads but then it just sits there, all cores basically unused.

Python 3.6.9 on Ubuntu-18.04 WSL2

** Edit** I removed the enable for progress_bar in my little console application, and it seems that whatever deadlock is occurring has disappeared, it seems to be progressing pretty well

1reaction
till-mcommented, Sep 12, 2022

I’m assuming this has been fixed.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pandarallel — A simple and efficient tool to parallelize your ...
The idea of Pandaral·lel is to distribute your pandas calculation over all available CPUs on your computer to get a significant speed ...
Read more >
Make Pandas DataFrame apply() use all cores? - Stack Overflow
Our of pure curiosity, is there a way to limit number of cores it uses when doing parallel apply? I have a shared...
Read more >
dotnet process freezes and goes upto 99 % of cpu utilisation
Visual Studio is unusable after macOS upgrade to 11.6.1. Within 5 seconds the dotnet process reaches 99 % CPU utilisation and after that...
Read more >
Tips for first-time users — Ray 2.2.0
Since each task requests by default one CPU, this setting allows us to execute up to four tasks in parallel. As a result,...
Read more >
Embarrassingly parallel for loops - Joblib - Read the Docs
Parallel uses the 'loky' backend module to start separate Python worker processes to execute tasks concurrently on separate CPUs. This is a reasonable ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found