KeyError in `Worker.handle_compute_task` (causes deadlock)
See original GitHub issueTraceback (most recent call last):
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 1237, in handle_scheduler
await self.handle_stream(
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/core.py", line 564, in handle_stream
handler(**merge(extra, msg))
File "/opt/conda/envs/coiled/lib/python3.9/site-packages/distributed/worker.py", line 1937, in handle_compute_task
self.tasks[key].nbytes = value
KeyError: "('slices-40d6794b777d639e9440f8f518224cfd', 2, 1)"
distributed.worker - INFO - Connection to scheduler broken. Reconnecting...
I may be able to reproduce this if necessary. I was running a stackstac example notebook on binder against a Coiled cluster over wss
, where the particular versions of things were causing a lot of errors (unrelated to dask). So I was frequently rerunning the same tasks, cancelling them, restarting the client, rerunning, etc. Perhaps this cancelling, restarting, rerunning is related?
@fjetter says
The only reason this keyerror can appear is if the compute instruction transitions its own dependency into a forgotten state which is very, very wrong.
Relevant code, ending at the line where the error occurs: https://github.com/dask/distributed/blob/11c41b525267cc144caa8d077b84a0939fecae97/distributed/worker.py#L1852-L1937
Scheduler code producing the message which causes this error: https://github.com/dask/distributed/blob/11c41b525267cc144caa8d077b84a0939fecae97/distributed/scheduler.py#L7953-L7994
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (2 by maintainers)
I was just installing
dask[distributed]
so I assume the latest version@gjoseph92 my code was running in a notebook and my kernell died, sorry. That is all I have