array.overlap and array.map_overlap block sizes are incorrect when depth is an unsigned bit type
See original GitHub issueMinimal Example:
import numpy as np
import dask.array as da
def func(block, block_info=None):
print(block_info[0]['array-location'])
print(block.shape)
return block
x = da.ones((100,), chunks=(50,50))
d_signed = tuple( np.array([10, 10]).astype(np.int16) )
y_signed = da.map_overlap(func, x, dtype=x.dtype, depth=d_signed)
d_unsigned = tuple( np.array([10, 10]).astype(np.uint16) )
y_unsigned = da.map_overlap(func, x, dtype=x.dtype, depth=d_unsigned)
y_signed.compute()
[(0, 70)] [(70, 140)] (70,) (70,)
y_unsigned.compute()
[(70, 110)] (40,) [(0, 70)] (60,)
I gather that map_overlap
simply pads each block with data from its neighbors, then calls map_blocks
on the “augmented” array. I suspect that when this augmented array is sliced there is probably an expression like:
new_block = augmented_array[d[0]:-d[0], ...]
where d
is the depth tuple. Simply taking the negative for the end of the slice does not work for unsigned bit types. This should be replaced with augmented_array[d[0]:-1*d[0], ...]
to ensure any unsigned bit type is properly cast before slicing.
Why should we support unsigned bit type arguments here?
The depth argument is intrinsically an unsigned quantity, there is no “negative depth” in this context. It’s reasonable for users to expect this data type to be accepted. When depths are computed automatically from things like array shapes (e.g. my_depth = tuple( np.array( my_array.shape // 8 ) )
for an approximate 12.5% overlap) they will often be unsigned bit types.
Environment:
- Dask version: 2.20.0
- Python version: 3.6.4
- Operating System: Scientific Linus
- Install method (conda, pip, source): pip
Issue Analytics
- State:
- Created 3 years ago
- Comments:10 (10 by maintainers)
This seems like a reasonable solution to me. Type coersion should happen sooner rather than later so that it can fail quickly.
Sorry for delay - I’m still willing to fix and submit PR, just have deliverables at work that take priority. I may take some time over the holiday to look into this.