da.core.elemwise undefined dtype inference on 0d Arrays
See original GitHub issueI’ve noticed an undefined behaviour problem with the dtype inference in elemwise. Here’s the relevant bit of code:
vals = [np.empty((1,) * a.ndim, dtype=a.dtype)
if not is_scalar_for_elemwise(a) else a
for a in args]
dt = apply_infer_dtype(op, vals, {}, 'elemwise', suggest_dtype=False)
If one of the arguments is a 0d dask array, then apply_infer_dtype is instead given an empty 0d numpy array (as the comment immediately above this code indicates, 0d dask arrays are not considered scalar for elemwise).
The problem is that a 0d numpy array is treated the same as a scalar by numpy, and hence the value affects dtype inference. Because np.empty
was used, the value is undefined.
The simple solution is to replace np.empty
with np.ones
, but I wanted to raise the question of what the dtype should be in this case. By doing so, the dtypes of scalars is effectively ignored (because the value 1 won’t cause any promotions), which makes it “optimistic” (assumes the true value won’t cause overflows), whereas perhaps it ought to be conservative (large scalar types to cause promotions). For example, just changing np.empty
to np.ones
will give
>>> a = da.ones(1, dtype=np.float32, chunks=1)
>>> b = da.from_array(np.array(1e200), chunks=())
>>> (a + b).dtype
dtype('float32')
even though the result doesn’t fit into a finite float32 because b is too large (there is a separate issue that (a + b).compute()
will actually return a float64 numpy array, but I’ll file a separate issue for that when I get around to it).
Issue Analytics
- State:
- Created 6 years ago
- Comments:6 (6 by maintainers)
Top GitHub Comments
This and #2575 are intertwined, I’ll post my thoughts on both here.
I think the following behavior should happen:
1
with the dtypeint32
, the output must be at leastint32
. This usesnp.find_common_type
semantics. The argument for this is that we can’t assume the value of a dask scalar, so we assume it’s the biggest value it could be, and the dtype can’t be cast down.np.result_type
).elemwise
operations should be enforced with a call to.astype(copy=False)
.The changes from this amount to:
astype(copy=False)
.While this breaks with numpy semantics slightly, I think the behavior here is more expected (the value based casting of numpy objects for some operations was a surprise to me when I first learned about it).
Are you sure? Per the code here:we create an array of the same dtype as a 0d dask array (sinceis_scalar_for_elemwise(0d_dask_array)
is false). I might be misunderstanding what you’re saying here.Edit: I misread the
np.empty
call, nvm, you’re right.