question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`dask.distributed.Client` fails to spin down gracefully in its context manager

See original GitHub issue

Versions

dask==0.16.1
distributed==1.20.2
tornado==4.5.3

Issue

I’m using dask.distributed.Client in a pytest.fixture, as follows:

import pytest
import dask.distributed.Client


@pytest.fixture(scope='module')
def client():
    with Client(n_workers=4) as dask_client:
        yield dask_client

def test_some_function(client):
    ...

def test_some_other_function(client):
   ...

Sometimes, when the test suite finishes, I get the error shown at the bottom of this ticket. This leads me to believe the test suite finishes before the Client teardown is actually complete, which seems like it could only happen if the Client’s __exit__ implementation returns too early. I also get the same error message if I simply execute the following, which seems to indicate the same.

while True:
    with Client(n_workers=4) as dask_client:
        # do something with dask_client
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x1a150a7510>, <tornado.concurrent.Future object at 0x1a14fac748>)
Traceback (most recent call last):
  File "/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/distributed/comm/tcp.py", line 174, in read
    n_frames = yield stream.read_bytes(8)
  File "/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/iostream.py", line 324, in read_bytes
    self._try_inline_read()
  File "/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/iostream.py", line 709, in _try_inline_read
    self._check_closed()
  File "/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/iostream.py", line 925, in _check_closed
    raise StreamClosedError(real_error=self.error)
tornado.iostream.StreamClosedError: Stream is closed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/distributed/core.py", line 464, in send_recv_from_rpc
    result = yield send_recv(comm=comm, op=key, **kwargs)
  File "/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/gen.py", line 1055, in run
    value = future.result()
  File "/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/gen.py", line 1063, in run
    yielded = self.gen.throw(*exc_info)
  File "/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/distributed/core.py", line 350, in send_recv
    response = yield comm.read()
  File "/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/gen.py", line 1055, in run
    value = future.result()
  File "/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/gen.py", line 307, in wrapper
    yielded = next(result)
  File "/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/distributed/comm/tcp.py", line 188, in read
    convert_stream_closed_error(self, e)
  File "/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/distributed/comm/tcp.py", line 124, in convert_stream_closed_error
    raise CommClosedError("in %s: %s: %s" % (obj, exc.__class__.__name__, exc))
distributed.comm.core.CommClosedError: in <closed TCP>: BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/ioloop.py", line 605, in _run_callback
    ret = callback()
  File "/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/ioloop.py", line 626, in _discard_future_result
    future.result()
  File "/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/gen.py", line 1063, in run
    yielded = self.gen.throw(*exc_info)
  File "/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/distributed/client.py", line 804, in _update_scheduler_info
    self._scheduler_identity = yield self.scheduler.identity()
  File "/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/gen.py", line 1055, in run
    value = future.result()
  File "/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "<string>", line 4, in raise_exc_info
  File "/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/gen.py", line 1063, in run
    yielded = self.gen.throw(*exc_info)
  File "/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/distributed/core.py", line 467, in send_recv_from_rpc
    % (e, key,))
distributed.comm.core.CommClosedError: in <closed TCP>: BrokenPipeError: [Errno 32] Broken pipe: while trying to call remote method 'identity'

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:3
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

4reactions
medley56commented, Nov 14, 2019

I’m experiencing this behavior when running Dask 2.6 in docker, but only intermittently. It appears that python or docker is not waiting for the dask cluster to wrap up it’s child processes? Or maybe this is a symptom of the docker zombie reaping problem. Anyone have ideas or solutions?

EDIT: Details are important.

dask==2.6.0
distributed==1.26.0
tornado==6.0.3
dask_cluster = LocalCluster(processes=True, threads_per_worker=1)
    with Client(dask_cluster) as dask_client:
        exit_code = run_distributed_processing(args.manifest_file, dask_client, db_state).value
        dask_client.wait()

dask_cluster.close()

Same exact errors as @macks22 experienced.

EDIT 2: Terrible workaround. I tried putting a sleep(5) at the end of the script and it drastically reduced (but did not eliminate) the number of errors but obviously didn’t actually fix anything.

1reaction
mrocklincommented, Feb 14, 2018

Can I ask you to try this again on the recent release of dask? Either

conda install dask
pip install dask[distributed] --upgrade

On Wed, Feb 14, 2018 at 8:58 AM, Mack notifications@github.com wrote:

Versions

dask==0.16.1 distributed==1.20.2 tornado==4.5.3

Issue

I’m using dask.distributed.Client in a pytest.fixture, as follows:

import pytest import dask.distributed.Client

@pytest.fixture(scope=‘module’) def client(): with Client(n_workers=4) as dask_client: yield dask_client

def test_some_function(client): …

def test_some_other_function(client): …

Sometimes, when the test suite finishes, I get the error shown at the bottom of this ticket. This leads me to believe the test suite finishes before the Client teardown is actually complete, which seems like it could only happen if the Client’s exit implementation returns too early. I also get the same error message if I simply execute the following, which seems to indicate the same.

while True: with Client(n_workers=4) as dask_client: # do something with dask_client

tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x1a150a7510>, <tornado.concurrent.Future object at 0x1a14fac748>) Traceback (most recent call last): File “/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/distributed/comm/tcp.py”, line 174, in read n_frames = yield stream.read_bytes(8) File “/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/iostream.py”, line 324, in read_bytes self._try_inline_read() File “/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/iostream.py”, line 709, in _try_inline_read self._check_closed() File “/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/iostream.py”, line 925, in _check_closed raise StreamClosedError(real_error=self.error) tornado.iostream.StreamClosedError: Stream is closed

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File “/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/distributed/core.py”, line 464, in send_recv_from_rpc result = yield send_recv(comm=comm, op=key, **kwargs) File “/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/gen.py”, line 1055, in run value = future.result() File “/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/concurrent.py”, line 238, in result raise_exc_info(self._exc_info) File “<string>”, line 4, in raise_exc_info File “/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/gen.py”, line 1063, in run yielded = self.gen.throw(*exc_info) File “/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/distributed/core.py”, line 350, in send_recv response = yield comm.read() File “/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/gen.py”, line 1055, in run value = future.result() File “/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/concurrent.py”, line 238, in result raise_exc_info(self._exc_info) File “<string>”, line 4, in raise_exc_info File “/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/gen.py”, line 307, in wrapper yielded = next(result) File “/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/distributed/comm/tcp.py”, line 188, in read convert_stream_closed_error(self, e) File “/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/distributed/comm/tcp.py”, line 124, in convert_stream_closed_error raise CommClosedError(“in %s: %s: %s” % (obj, exc.class.name, exc)) distributed.comm.core.CommClosedError: in <closed TCP>: BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File “/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/ioloop.py”, line 605, in _run_callback ret = callback() File “/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/stack_context.py”, line 277, in null_wrapper return fn(*args, **kwargs) File “/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/ioloop.py”, line 626, in _discard_future_result future.result() File “/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/concurrent.py”, line 238, in result raise_exc_info(self._exc_info) File “<string>”, line 4, in raise_exc_info File “/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/gen.py”, line 1063, in run yielded = self.gen.throw(*exc_info) File “/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/distributed/client.py”, line 804, in _update_scheduler_info self._scheduler_identity = yield self.scheduler.identity() File “/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/gen.py”, line 1055, in run value = future.result() File “/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/concurrent.py”, line 238, in result raise_exc_info(self._exc_info) File “<string>”, line 4, in raise_exc_info File “/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/tornado/gen.py”, line 1063, in run yielded = self.gen.throw(*exc_info) File “/Users/vru959/anaconda2/envs/py3k/lib/python3.6/site-packages/distributed/core.py”, line 467, in send_recv_from_rpc % (e, key,)) distributed.comm.core.CommClosedError: in <closed TCP>: BrokenPipeError: [Errno 32] Broken pipe: while trying to call remote method ‘identity’

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/distributed/issues/1761, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszFiOg0ryu6bAlA2QeScE1uFik2yeks5tUuaBgaJpZM4SFWso .

Read more comments on GitHub >

github_iconTop Results From Across the Web

API — Dask.distributed 2022.12.1 documentation
Thread-local, Task-local context manager that causes the Client.current class method to return self. ... Shut down the connected scheduler and workers.
Read more >
python - Dask hanging when called from command prompt
I think it's just an indentation error, so correcting your script, it works. You might also want to use context managers to ensure...
Read more >
MPIRE for Python: MultiProcessing Is Really Easy
I won't go down the rabbit hole of explaining what it does and why it's still here ... and distributed computing packages like...
Read more >
Changelog — Dask.distributed 2.11.0 documentation
Error hard when Dask has mismatched versions or lz4 installed ... Ensure Client connection pool semaphore attaches to the Client event loop ...
Read more >
How to Use the ThreadPool Context Manager
__exit__(): Executed after the code block. These two methods are always executed, even if an error or exception occurs within the block. In...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found