Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Dask doesn't play well with interrupts

See original GitHub issue

This is on Python 3.5.2, dask 0.10.1, and jupyter-notebook 4.2.1, all installed via conda on a 64-bit Ubuntu machine.

If you run the following piece of code:

import json

from dask.diagnostics import ProgressBar
import dask.bag as db

j = json.dumps({"a": 1, "b": 1})

for i in range(8):
    data = [j for _ in range(10 ** i)]
    bag = db.from_sequence(data, npartitions=4).map(json.loads)
    with ProgressBar():
        db.zip(bag.pluck("a"), bag.pluck('b')).count().compute()

And try to do a keyboard interrupt in the middle, sometimes dask will flip out and just refuse to exit (no matter how many CTRL-Cs and CTRL-Ds you press. See https://asciinema.org/a/2yqxbhn1patwenwy024m2316l for a video.

This is especially annoying on a Jupyter notebook. Sometimes, dask will mysteriously keep printing out the progress bar from the last session, even after restarting the kernel (!).

EDIT: When this happens, dask will also leave around python processes still running. See https://asciinema.org/a/8x1liiyuiw7910mugj9tscher for an example (particularly the extra python process at the end).

Issue Analytics

State:
Created 7 years ago
Comments:8 (3 by maintainers)

Top GitHub Comments

1reaction

mrocklincommented, Aug 5, 2016

Possibly resolved in #1444

0reactions

syagevcommented, Jul 10, 2019

Something like this still occurs for me with the distributed scheduler running on a LocalCluster. When I work in a jupyter notebook, and the notebook is executing a blocking .compute() method, and then hitting CTRL+C - the blocking method indeed returns, but all the worker processes are killed (not very gracefully). The end result is that there are zombie python processes and the scheduler reports 0 workers.

So far I haven’t found a good way to recover from this without restarting the jupyter kernel, because issuing Client.restart() doesn’t restart the workers, and trying to execute Client(LocalCluster()) again gives port 8787 is already in use.

If anyone has a good tip on how to workaround that could really help.