question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Dask doesn't play well with interrupts

See original GitHub issue

This is on Python 3.5.2, dask 0.10.1, and jupyter-notebook 4.2.1, all installed via conda on a 64-bit Ubuntu machine.

If you run the following piece of code:

import json

from dask.diagnostics import ProgressBar
import dask.bag as db

j = json.dumps({"a": 1, "b": 1})

for i in range(8):
    data = [j for _ in range(10 ** i)]
    bag = db.from_sequence(data, npartitions=4).map(json.loads)
    with ProgressBar():
        db.zip(bag.pluck("a"), bag.pluck('b')).count().compute()

And try to do a keyboard interrupt in the middle, sometimes dask will flip out and just refuse to exit (no matter how many CTRL-Cs and CTRL-Ds you press. See https://asciinema.org/a/2yqxbhn1patwenwy024m2316l for a video.

This is especially annoying on a Jupyter notebook. Sometimes, dask will mysteriously keep printing out the progress bar from the last session, even after restarting the kernel (!).

EDIT: When this happens, dask will also leave around python processes still running. See https://asciinema.org/a/8x1liiyuiw7910mugj9tscher for an example (particularly the extra python process at the end).

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
mrocklincommented, Aug 5, 2016

Possibly resolved in #1444

0reactions
syagevcommented, Jul 10, 2019

Something like this still occurs for me with the distributed scheduler running on a LocalCluster. When I work in a jupyter notebook, and the notebook is executing a blocking .compute() method, and then hitting CTRL+C - the blocking method indeed returns, but all the worker processes are killed (not very gracefully). The end result is that there are zombie python processes and the scheduler reports 0 workers.

So far I haven’t found a good way to recover from this without restarting the jupyter kernel, because issuing Client.restart() doesn’t restart the workers, and trying to execute Client(LocalCluster()) again gives port 8787 is already in use.

If anyone has a good tip on how to workaround that could really help.

Read more comments on GitHub >

github_iconTop Results From Across the Web

What's the correct way to clean up after an interrupted event ...
When you CTRL+C, the event loop gets stopped, so your calls to t.cancel() don't actually take effect. For the tasks to be cancelled, ......
Read more >
Why did my worker die? - Dask.distributed
Workers may exit in normal functioning because they have been asked to, e.g., they received a keyboard interrupt (^C), or the scheduler scaled...
Read more >
Working with interrupts - Project Guidance - Arduino Forum
Hello, I came from professional Python programming with GTK - very high level in relation to C and AVR. At the beginning I...
Read more >
4.6. Interrupt Handling - Understanding the Linux Kernel, 3rd ...
This means that the interrupt vector alone does not tell the whole story. ... ISA) do not reliably operate if their IRQ line...
Read more >
Dealing with Interrupts - Google - Site Reliability Engineering
Do One Thing Well · Distractibility. The ways in which an engineer may be distracted and therefore prevented from achieving a state of...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found