min/max fail on empty chunks
See original GitHub issueIf a Dask Array has known shape, min
and max
work fine. If one of the dimensions of the array is unknown, the Dask graph can be constructed fine, but computation fails. It seems like it is probably applying the underlying reduction to empty chunks. Though it doesn’t seem like it should be trying that. 🤔
An example is shown below and the conda
environment used to create it in included as well. The example just uses max
for simplicity, but min
can reproduce it just as well.
Example:
In [1]: import dask.array as da
In [2]: a = da.arange(5, chunks=2)
In [3]: a[a < 2]
Out[3]: dask.array<getitem, shape=(nan,), dtype=int64, chunksize=(nan,)>
In [4]: a[a < 2].compute()
Out[4]: array([0, 1])
In [5]: da.max(a[a < 2])
Out[5]: dask.array<amax-aggregate, shape=(), dtype=int64, chunksize=()>
In [6]: da.max(a[a < 2]).compute()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-6-6539984fa744> in <module>()
----> 1 da.max(a[a < 2]).compute()
/zopt/conda2/envs/test/lib/python3.6/site-packages/dask/base.py in compute(self, **kwargs)
97 Extra keywords to forward to the scheduler ``get`` function.
98 """
---> 99 (result,) = compute(self, traverse=False, **kwargs)
100 return result
101
/zopt/conda2/envs/test/lib/python3.6/site-packages/dask/base.py in compute(*args, **kwargs)
204 dsk = collections_to_dsk(variables, optimize_graph, **kwargs)
205 keys = [var._keys() for var in variables]
--> 206 results = get(dsk, keys, **kwargs)
207
208 results_iter = iter(results)
/zopt/conda2/envs/test/lib/python3.6/site-packages/dask/threaded.py in get(dsk, result, cache, num_workers, **kwargs)
73 results = get_async(pool.apply_async, len(pool._pool), dsk, result,
74 cache=cache, get_id=_thread_get_id,
---> 75 pack_exception=pack_exception, **kwargs)
76
77 # Cleanup pools associated to dead threads
/zopt/conda2/envs/test/lib/python3.6/site-packages/dask/local.py in get_async(apply_async, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, **kwargs)
519 _execute_task(task, data) # Re-execute locally
520 else:
--> 521 raise_exception(exc, tb)
522 res, worker_id = loads(res_info)
523 state['cache'][key] = res
/zopt/conda2/envs/test/lib/python3.6/site-packages/dask/compatibility.py in reraise(exc, tb)
58 if exc.__traceback__ is not tb:
59 raise exc.with_traceback(tb)
---> 60 raise exc
61
62 else:
/zopt/conda2/envs/test/lib/python3.6/site-packages/dask/local.py in execute_task(key, task_info, dumps, loads, get_id, pack_exception)
288 try:
289 task, data = loads(task_info)
--> 290 result = _execute_task(task, data)
291 id = get_id()
292 result = dumps((result, id))
/zopt/conda2/envs/test/lib/python3.6/site-packages/dask/local.py in _execute_task(arg, cache, dsk)
269 func, args = arg[0], arg[1:]
270 args2 = [_execute_task(a, cache) for a in args]
--> 271 return func(*args2)
272 elif not ishashable(arg):
273 return arg
/zopt/conda2/envs/test/lib/python3.6/site-packages/dask/compatibility.py in apply(func, args, kwargs)
45 def apply(func, args, kwargs=None):
46 if kwargs:
---> 47 return func(*args, **kwargs)
48 else:
49 return func(*args)
/zopt/conda2/envs/test/lib/python3.6/site-packages/numpy/core/fromnumeric.py in amax(a, axis, out, keepdims)
2270
2271 return _methods._amax(a, axis=axis,
-> 2272 out=out, **kwargs)
2273
2274
/zopt/conda2/envs/test/lib/python3.6/site-packages/numpy/core/_methods.py in _amax(a, axis, out, keepdims)
24 # small reductions
25 def _amax(a, axis=None, out=None, keepdims=False):
---> 26 return umr_maximum(a, axis, None, out, keepdims)
27
28 def _amin(a, axis=None, out=None, keepdims=False):
ValueError: zero-size array to reduction operation maximum which has no identity
Environment:
name: test
channels:
- conda-forge
- defaults
dependencies:
- appnope=0.1.0=py36_0
- blas=1.1=openblas
- bokeh=0.12.9=py36_0
- ca-certificates=2017.7.27.1=0
- certifi=2017.7.27.1=py36_0
- click=6.7=py36_0
- cloudpickle=0.4.0=py36_0
- dask=0.15.4=py_0
- dask-core=0.15.4=py_0
- decorator=4.1.2=py36_0
- distributed=1.19.3=py36_0
- heapdict=1.0.0=py36_0
- ipython=6.2.1=py36_0
- ipython_genutils=0.2.0=py36_0
- jedi=0.10.2=py36_0
- jinja2=2.9.6=py36_0
- libgfortran=3.0.0=0
- locket=0.2.0=py36_1
- markupsafe=1.0=py36_0
- msgpack-python=0.4.8=py36_0
- ncurses=5.9=10
- numpy=1.13.3=py36_blas_openblas_200
- openblas=0.2.19=2
- openssl=1.0.2l=0
- pandas=0.20.3=py36_1
- partd=0.3.8=py36_0
- pexpect=4.2.1=py36_0
- pickleshare=0.7.4=py36_0
- pip=9.0.1=py36_0
- prompt_toolkit=1.0.15=py36_0
- psutil=5.3.1=py36_0
- ptyprocess=0.5.2=py36_0
- pygments=2.2.0=py36_0
- python=3.6.3=0
- python-dateutil=2.6.1=py36_0
- pytz=2017.2=py36_0
- pyyaml=3.12=py36_1
- readline=6.2=0
- setuptools=36.6.0=py36_1
- simplegeneric=0.8.1=py36_0
- six=1.11.0=py36_1
- sortedcontainers=1.5.7=py36_0
- sqlite=3.13.0=1
- tblib=1.3.2=py36_0
- tk=8.5.19=2
- toolz=0.8.2=py_2
- tornado=4.5.2=py36_0
- traitlets=4.3.2=py36_0
- wcwidth=0.1.7=py36_0
- wheel=0.30.0=py_1
- xz=5.2.3=0
- yaml=0.1.6=0
- zict=0.1.3=py_0
- zlib=1.2.8=3
Issue Analytics
- State:
- Created 6 years ago
- Comments:7 (7 by maintainers)
Top Results From Across the Web
Misbehaving reductions over arrays with empty chunks #3906
because the first chunk of b is empty (due to the slicing operation). However, min() is obviously well-defined for this array.
Read more >How to draw a waveform with min/max algorithm correctly ...
This algorithm calculate the min and max values from a "chunk" of samples and draw a vertical line between the two points.
Read more >LAScatalog processing engine
Empty chunks This often happens on the edge of the collection. In this case the engine displays the chunks in gray.
Read more >How to use DBMS_PARALLEL_EXECUTE to chunk over DB link
Hi Team, Can you please have a look at below: I tried copying table data from One DB to Other over DB link...
Read more >Merging Empty Chunks in MongoDB - Percona
If we delete a lot of data, perhaps a significant number of the chunks will be empty. This can be a significant issue...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
It’s interesting you mention that. It reminds me of this issue ( https://github.com/dask/dask/issues/4173 ), which suggests generally optimizing out empty chunks. As a consequence it would also solve this (and possibly other) empty chunk oddities.
There are some good suggestions about how to tackle this in the linked issue: