question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error: dask w/ 2 streamz

See original GitHub issue

Expected: below program should run fine, or crash after printing "exiting".

import dask.distributed
import streamz

if __name__ == '__main__':
    with dask.distributed.Client() as client:
        s1 = streamz.Stream()
        s1o = s1
        s1o = s1o.scatter()
        s1o = s1o.map(lambda x: x+2)
        s1o = s1o.buffer(80)
        s1o = s1o.gather()

        s2 = streamz.Stream()
        s2o = s2
        s2o = s2o.scatter()
        s2o = s2o.map(lambda x: x+3)
        s2o = s2o.buffer(80)
        s2o = s2o.gather()

        rc = streamz.RefCounter()
        for _ in range(1000):
            s1.emit(_)
            s2.emit(_)
        print('exiting')

Actual: program crashes with below; exiting wasn’t printed:

$ python dask_streamz_crash2.py
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x7fbcb7e54d60>>, <Future finished exception=CancelledError('lambda-8900fe7fe84a7a9105ea61c47a62abb4')>)
Traceback (most recent call last):
  File "/home/walt/.conda/envs/py38streamz/lib/python3.8/site-packages/tornado/ioloop.py", line 741, in _run_callback
    ret = callback()
  File "/home/walt/.conda/envs/py38streamz/lib/python3.8/site-packages/tornado/ioloop.py", line 765, in _discard_future_result
    future.result()
  File "/home/walt/.conda/envs/py38streamz/lib/python3.8/site-packages/tornado/gen.py", line 769, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
  File "/home/walt/.conda/envs/py38streamz/lib/python3.8/site-packages/streamz/core.py", line 1168, in cb
    yield self._emit(x, metadata=metadata)
  File "/home/walt/.conda/envs/py38streamz/lib/python3.8/site-packages/tornado/gen.py", line 762, in run
    value = future.result()
  File "/home/walt/.conda/envs/py38streamz/lib/python3.8/site-packages/tornado/gen.py", line 526, in callback
    result_list.append(f.result())
  File "/home/walt/.conda/envs/py38streamz/lib/python3.8/site-packages/tornado/gen.py", line 769, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
  File "/home/walt/.conda/envs/py38streamz/lib/python3.8/site-packages/streamz/dask.py", line 144, in update
    result = yield client.gather(x, asynchronous=True)
  File "/home/walt/.conda/envs/py38streamz/lib/python3.8/site-packages/tornado/gen.py", line 762, in run
    value = future.result()
  File "/home/walt/.conda/envs/py38streamz/lib/python3.8/site-packages/distributed/client.py", line 1852, in _gather
    raise exc
concurrent.futures._base.CancelledError: lambda-8900fe7fe84a7a9105ea61c47a62abb4

Versions: python==3.8.5, dask==2.30.0, streamz==0.6.1

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:14 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
wwoodscommented, Dec 4, 2020

Right – it looks like there were two choices, dict and list, and they chose to re-implement Dask’s internal approach, which would be fine as long as every element being scattered resulted in a unique key. When the elements being scattered don’t have unique keys, it causes problems with streamz. Even if the non-uniqueness bugs could be fixed while keeping the dict approach, unique elements and caching is also just a harder paradigm for users to wrap their heads around.

Pretty easy PR w/ a regression test, if the change looks good and like it won’t break anything else.

0reactions
martindurantcommented, Dec 4, 2020

Seems like a reasonable thing to try, if all existing tests pass. Thanks for digging, @wwoods . cc @nils-braun

Read more comments on GitHub >

github_iconTop Results From Across the Web

Running Streamz with Dask.Distributed · Issue #209 - GitHub
I have a dask.distributed cluster setup with workers spread across multiple nodes and I am testing out a simple example to stream data...
Read more >
python 3.x - streamz exception on dask gather - Stack Overflow
i am trying to use streamz to manage an image processing pipeline. ... class Camera(object): imgs = [imread(f + '.jpg') for f in...
Read more >
Why did my worker die? - Dask.distributed
Why did my worker die?¶. A Dask worker can cease functioning for a number of reasons. These fall into the following categories: the...
Read more >
GetDASHStreamingSessionURL - Amazon Kinesis Video ...
Kinesis Video Streams creates an MPEG-DASH streaming session to be used for ... 16 minutes of video on streams with 1-second fragments, and...
Read more >
Delivering Live YouTube Content via DASH
The MPD must be complete and conformant with the DASH standard. ... to timeout or other errors, corresponds to a gap in the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found