Error: dask w/ 2 streamz
See original GitHub issueExpected: below program should run fine, or crash after printing "exiting"
.
import dask.distributed
import streamz
if __name__ == '__main__':
with dask.distributed.Client() as client:
s1 = streamz.Stream()
s1o = s1
s1o = s1o.scatter()
s1o = s1o.map(lambda x: x+2)
s1o = s1o.buffer(80)
s1o = s1o.gather()
s2 = streamz.Stream()
s2o = s2
s2o = s2o.scatter()
s2o = s2o.map(lambda x: x+3)
s2o = s2o.buffer(80)
s2o = s2o.gather()
rc = streamz.RefCounter()
for _ in range(1000):
s1.emit(_)
s2.emit(_)
print('exiting')
Actual: program crashes with below; exiting wasn’t printed:
$ python dask_streamz_crash2.py
tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x7fbcb7e54d60>>, <Future finished exception=CancelledError('lambda-8900fe7fe84a7a9105ea61c47a62abb4')>)
Traceback (most recent call last):
File "/home/walt/.conda/envs/py38streamz/lib/python3.8/site-packages/tornado/ioloop.py", line 741, in _run_callback
ret = callback()
File "/home/walt/.conda/envs/py38streamz/lib/python3.8/site-packages/tornado/ioloop.py", line 765, in _discard_future_result
future.result()
File "/home/walt/.conda/envs/py38streamz/lib/python3.8/site-packages/tornado/gen.py", line 769, in run
yielded = self.gen.throw(*exc_info) # type: ignore
File "/home/walt/.conda/envs/py38streamz/lib/python3.8/site-packages/streamz/core.py", line 1168, in cb
yield self._emit(x, metadata=metadata)
File "/home/walt/.conda/envs/py38streamz/lib/python3.8/site-packages/tornado/gen.py", line 762, in run
value = future.result()
File "/home/walt/.conda/envs/py38streamz/lib/python3.8/site-packages/tornado/gen.py", line 526, in callback
result_list.append(f.result())
File "/home/walt/.conda/envs/py38streamz/lib/python3.8/site-packages/tornado/gen.py", line 769, in run
yielded = self.gen.throw(*exc_info) # type: ignore
File "/home/walt/.conda/envs/py38streamz/lib/python3.8/site-packages/streamz/dask.py", line 144, in update
result = yield client.gather(x, asynchronous=True)
File "/home/walt/.conda/envs/py38streamz/lib/python3.8/site-packages/tornado/gen.py", line 762, in run
value = future.result()
File "/home/walt/.conda/envs/py38streamz/lib/python3.8/site-packages/distributed/client.py", line 1852, in _gather
raise exc
concurrent.futures._base.CancelledError: lambda-8900fe7fe84a7a9105ea61c47a62abb4
Versions: python==3.8.5, dask==2.30.0, streamz==0.6.1
Issue Analytics
- State:
- Created 3 years ago
- Comments:14 (5 by maintainers)
Top Results From Across the Web
Running Streamz with Dask.Distributed · Issue #209 - GitHub
I have a dask.distributed cluster setup with workers spread across multiple nodes and I am testing out a simple example to stream data...
Read more >python 3.x - streamz exception on dask gather - Stack Overflow
i am trying to use streamz to manage an image processing pipeline. ... class Camera(object): imgs = [imread(f + '.jpg') for f in...
Read more >Why did my worker die? - Dask.distributed
Why did my worker die?¶. A Dask worker can cease functioning for a number of reasons. These fall into the following categories: the...
Read more >GetDASHStreamingSessionURL - Amazon Kinesis Video ...
Kinesis Video Streams creates an MPEG-DASH streaming session to be used for ... 16 minutes of video on streams with 1-second fragments, and...
Read more >Delivering Live YouTube Content via DASH
The MPD must be complete and conformant with the DASH standard. ... to timeout or other errors, corresponds to a gap in the...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Right – it looks like there were two choices, dict and list, and they chose to re-implement Dask’s internal approach, which would be fine as long as every element being scattered resulted in a unique key. When the elements being scattered don’t have unique keys, it causes problems with streamz. Even if the non-uniqueness bugs could be fixed while keeping the dict approach, unique elements and caching is also just a harder paradigm for users to wrap their heads around.
Pretty easy PR w/ a regression test, if the change looks good and like it won’t break anything else.
Seems like a reasonable thing to try, if all existing tests pass. Thanks for digging, @wwoods . cc @nils-braun