question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

da.core.elemwise undefined dtype inference on 0d Arrays

See original GitHub issue

I’ve noticed an undefined behaviour problem with the dtype inference in elemwise. Here’s the relevant bit of code:

        vals = [np.empty((1,) * a.ndim, dtype=a.dtype)
                if not is_scalar_for_elemwise(a) else a
                for a in args]
        dt = apply_infer_dtype(op, vals, {}, 'elemwise', suggest_dtype=False)

If one of the arguments is a 0d dask array, then apply_infer_dtype is instead given an empty 0d numpy array (as the comment immediately above this code indicates, 0d dask arrays are not considered scalar for elemwise).

The problem is that a 0d numpy array is treated the same as a scalar by numpy, and hence the value affects dtype inference. Because np.empty was used, the value is undefined.

The simple solution is to replace np.empty with np.ones, but I wanted to raise the question of what the dtype should be in this case. By doing so, the dtypes of scalars is effectively ignored (because the value 1 won’t cause any promotions), which makes it “optimistic” (assumes the true value won’t cause overflows), whereas perhaps it ought to be conservative (large scalar types to cause promotions). For example, just changing np.empty to np.ones will give

>>> a = da.ones(1, dtype=np.float32, chunks=1)
>>> b = da.from_array(np.array(1e200), chunks=())
>>> (a + b).dtype
dtype('float32')

even though the result doesn’t fit into a finite float32 because b is too large (there is a separate issue that (a + b).compute() will actually return a float64 numpy array, but I’ll file a separate issue for that when I get around to it).

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
jcristcommented, Aug 2, 2017

This and #2575 are intertwined, I’ll post my thoughts on both here.

I think the following behavior should happen:

  • Dask array arguments are interpreted as being “strict” dtypes, meaning no value based interpretation. If a 0d array contains 1 with the dtype int32, the output must be at least int32. This uses np.find_common_type semantics. The argument for this is that we can’t assume the value of a dask scalar, so we assume it’s the biggest value it could be, and the dtype can’t be cast down.
  • Concrete arguments (e.g. non-dask arguments) are interpreted using normal numpy semantics (e.g. np.result_type).
  • Dtype on output of elemwise operations should be enforced with a call to .astype(copy=False).

The changes from this amount to:

  • Keep the dtype inference logic the same, we already infer dask scalars (0d arrays) using 1d array semantics.
  • Enforce output type from elemwise operations using astype(copy=False).

While this breaks with numpy semantics slightly, I think the behavior here is more expected (the value based casting of numpy objects for some operations was a surprise to me when I first learned about it).

0reactions
jcristcommented, Aug 2, 2017

Actually we don’t, we infer it using 0d array semantics

Are you sure? Per the code here:

vals = [np.empty((1,) * a.ndim, dtype=a.dtype)
        if not is_scalar_for_elemwise(a) else a
        for a in args]

we create an array of the same dtype as a 0d dask array (since is_scalar_for_elemwise(0d_dask_array) is false). I might be misunderstanding what you’re saying here.

Edit: I misread the np.empty call, nvm, you’re right.

Read more comments on GitHub >

github_iconTop Results From Across the Web

python - numba does not accept numpy arrays with dtype=object
If I turn off numba by setting nopython=False the code works fine. Setting dtype=list in the values array does not improve things. Any...
Read more >
dask/core.py at main · dask/dask - GitHub
Tries to infer output dtype of ``func`` for a small set of input arguments. Parameters ... The ``da.map_blocks`` function can also accept multiple...
Read more >
dask.array.core - Dask documentation
Source code for dask.array.core ... loc[1]) >>> da.map_blocks(func, chunks=((4, 4),), dtype=np.float_) dask.array<func, shape=(8,), dtype=float64, ...
Read more >
Release Notes — NumPy v1.15 Manual
0d arrays passed to the r_ and mr_ concatenation helpers are now treated ... to be compared elementwise with np.equal(a, b, dtype=object) ....
Read more >
What's New - Xarray
This release also features more support for the Python Array API. Many thanks to the 16 contributors to this ... Add Dataset.dtypes ,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found