Distributed 2021.3.1 `distributed.protocol.serialize.pickle_loads` fails with `IndexError: tuple index out of range`
See original GitHub issueWhat happened:
The following exception occred with the latest version of distributed, in a test that has previously passed:
header = {'compression': (None, None), 'num-sub-frames': 2, 'serializer': 'pickle', 'split-num-sub-frames': (1, 1), ...}
frames = [<memory at 0x1209deae0>, <memory at 0x1209dea10>]
def pickle_loads(header, frames):
x, buffers = frames[0], frames[1:]
writeable = header["writeable"]
for i in range(len(buffers)):
mv = memoryview(buffers[i])
> if writeable[i] == mv.readonly:
E IndexError: tuple index out of range
“writeable” is an empty tuple in the above header.
What you expected to happen:
After digging a bit and comparing runs of the same test between 2021.3.0 and 2021.3.1, I found the following:
In version 2021.3.0
the input frames
always has one element, hence buffers
is always an empty list --> so the for loop, which contains writeable[i]
never runs; writable
is always an empty tuple
In version 2021.3.1
the third time it gets to this function, frames
has 2 elements, hence buffers
is not empty, and the for loop is executed; writable
is still an empty tuple, hence code fails.
I saw that there were substantial changes to distributed.protocol.core.loads
, where frames
is passed down in its “truncated” from (sub_frames
) to the function which eventually breaks. I don’t know if this is a bug introduced, or our code needs changing. I’m not familiar with the underlying mechanisms, so I’d appreciate if someone could take a look.
Environment:
- Dask version: 2021.3.1
- Python version: 3.7.10
- Operating System: MacOS Mojave (but also fails on linux-based gitlab runners)
- Install method (conda, pip, source): pip
Issue Analytics
- State:
- Created 2 years ago
- Reactions:2
- Comments:43 (27 by maintainers)
Top GitHub Comments
Thank you everyone who participated in helping to track this down. I appreciate it.
Thanks @alejandrofiel, however since others can’t access the CSV files you’re using, this makes it difficult for us to debug. See https://blog.dask.org/2018/02/28/minimal-bug-reports for some information on crafting minimal bug reports