question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

min/max fail on empty chunks

See original GitHub issue

If a Dask Array has known shape, min and max work fine. If one of the dimensions of the array is unknown, the Dask graph can be constructed fine, but computation fails. It seems like it is probably applying the underlying reduction to empty chunks. Though it doesn’t seem like it should be trying that. 🤔

An example is shown below and the conda environment used to create it in included as well. The example just uses max for simplicity, but min can reproduce it just as well.

Example:
In [1]: import dask.array as da

In [2]: a = da.arange(5, chunks=2)

In [3]: a[a < 2]
Out[3]: dask.array<getitem, shape=(nan,), dtype=int64, chunksize=(nan,)>

In [4]: a[a < 2].compute()
Out[4]: array([0, 1])

In [5]: da.max(a[a < 2])
Out[5]: dask.array<amax-aggregate, shape=(), dtype=int64, chunksize=()>

In [6]: da.max(a[a < 2]).compute()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-6539984fa744> in <module>()
----> 1 da.max(a[a < 2]).compute()

/zopt/conda2/envs/test/lib/python3.6/site-packages/dask/base.py in compute(self, **kwargs)
     97             Extra keywords to forward to the scheduler ``get`` function.
     98         """
---> 99         (result,) = compute(self, traverse=False, **kwargs)
    100         return result
    101 

/zopt/conda2/envs/test/lib/python3.6/site-packages/dask/base.py in compute(*args, **kwargs)
    204     dsk = collections_to_dsk(variables, optimize_graph, **kwargs)
    205     keys = [var._keys() for var in variables]
--> 206     results = get(dsk, keys, **kwargs)
    207 
    208     results_iter = iter(results)

/zopt/conda2/envs/test/lib/python3.6/site-packages/dask/threaded.py in get(dsk, result, cache, num_workers, **kwargs)
     73     results = get_async(pool.apply_async, len(pool._pool), dsk, result,
     74                         cache=cache, get_id=_thread_get_id,
---> 75                         pack_exception=pack_exception, **kwargs)
     76 
     77     # Cleanup pools associated to dead threads

/zopt/conda2/envs/test/lib/python3.6/site-packages/dask/local.py in get_async(apply_async, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, **kwargs)
    519                         _execute_task(task, data)  # Re-execute locally
    520                     else:
--> 521                         raise_exception(exc, tb)
    522                 res, worker_id = loads(res_info)
    523                 state['cache'][key] = res

/zopt/conda2/envs/test/lib/python3.6/site-packages/dask/compatibility.py in reraise(exc, tb)
     58         if exc.__traceback__ is not tb:
     59             raise exc.with_traceback(tb)
---> 60         raise exc
     61 
     62 else:

/zopt/conda2/envs/test/lib/python3.6/site-packages/dask/local.py in execute_task(key, task_info, dumps, loads, get_id, pack_exception)
    288     try:
    289         task, data = loads(task_info)
--> 290         result = _execute_task(task, data)
    291         id = get_id()
    292         result = dumps((result, id))

/zopt/conda2/envs/test/lib/python3.6/site-packages/dask/local.py in _execute_task(arg, cache, dsk)
    269         func, args = arg[0], arg[1:]
    270         args2 = [_execute_task(a, cache) for a in args]
--> 271         return func(*args2)
    272     elif not ishashable(arg):
    273         return arg

/zopt/conda2/envs/test/lib/python3.6/site-packages/dask/compatibility.py in apply(func, args, kwargs)
     45     def apply(func, args, kwargs=None):
     46         if kwargs:
---> 47             return func(*args, **kwargs)
     48         else:
     49             return func(*args)

/zopt/conda2/envs/test/lib/python3.6/site-packages/numpy/core/fromnumeric.py in amax(a, axis, out, keepdims)
   2270 
   2271     return _methods._amax(a, axis=axis,
-> 2272                           out=out, **kwargs)
   2273 
   2274 

/zopt/conda2/envs/test/lib/python3.6/site-packages/numpy/core/_methods.py in _amax(a, axis, out, keepdims)
     24 # small reductions
     25 def _amax(a, axis=None, out=None, keepdims=False):
---> 26     return umr_maximum(a, axis, None, out, keepdims)
     27 
     28 def _amin(a, axis=None, out=None, keepdims=False):

ValueError: zero-size array to reduction operation maximum which has no identity

Environment:
name: test
channels:
- conda-forge
- defaults
dependencies:
- appnope=0.1.0=py36_0
- blas=1.1=openblas
- bokeh=0.12.9=py36_0
- ca-certificates=2017.7.27.1=0
- certifi=2017.7.27.1=py36_0
- click=6.7=py36_0
- cloudpickle=0.4.0=py36_0
- dask=0.15.4=py_0
- dask-core=0.15.4=py_0
- decorator=4.1.2=py36_0
- distributed=1.19.3=py36_0
- heapdict=1.0.0=py36_0
- ipython=6.2.1=py36_0
- ipython_genutils=0.2.0=py36_0
- jedi=0.10.2=py36_0
- jinja2=2.9.6=py36_0
- libgfortran=3.0.0=0
- locket=0.2.0=py36_1
- markupsafe=1.0=py36_0
- msgpack-python=0.4.8=py36_0
- ncurses=5.9=10
- numpy=1.13.3=py36_blas_openblas_200
- openblas=0.2.19=2
- openssl=1.0.2l=0
- pandas=0.20.3=py36_1
- partd=0.3.8=py36_0
- pexpect=4.2.1=py36_0
- pickleshare=0.7.4=py36_0
- pip=9.0.1=py36_0
- prompt_toolkit=1.0.15=py36_0
- psutil=5.3.1=py36_0
- ptyprocess=0.5.2=py36_0
- pygments=2.2.0=py36_0
- python=3.6.3=0
- python-dateutil=2.6.1=py36_0
- pytz=2017.2=py36_0
- pyyaml=3.12=py36_1
- readline=6.2=0
- setuptools=36.6.0=py36_1
- simplegeneric=0.8.1=py36_0
- six=1.11.0=py36_1
- sortedcontainers=1.5.7=py36_0
- sqlite=3.13.0=1
- tblib=1.3.2=py36_0
- tk=8.5.19=2
- toolz=0.8.2=py_2
- tornado=4.5.2=py36_0
- traitlets=4.3.2=py36_0
- wcwidth=0.1.7=py36_0
- wheel=0.30.0=py_1
- xz=5.2.3=0
- yaml=0.1.6=0
- zict=0.1.3=py_0
- zlib=1.2.8=3

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
jakirkhamcommented, Oct 14, 2021

Possibly this is also useful? https://github.com/dask/dask/pull/4167

It’s interesting you mention that. It reminds me of this issue ( https://github.com/dask/dask/issues/4173 ), which suggests generally optimizing out empty chunks. As a consequence it would also solve this (and possibly other) empty chunk oddities.

1reaction
GenevieveBuckleycommented, Oct 12, 2021
Read more comments on GitHub >

github_iconTop Results From Across the Web

Misbehaving reductions over arrays with empty chunks #3906
because the first chunk of b is empty (due to the slicing operation). However, min() is obviously well-defined for this array.
Read more >
How to draw a waveform with min/max algorithm correctly ...
This algorithm calculate the min and max values from a "chunk" of samples and draw a vertical line between the two points.
Read more >
LAScatalog processing engine
Empty chunks​​ This often happens on the edge of the collection. In this case the engine displays the chunks in gray.
Read more >
How to use DBMS_PARALLEL_EXECUTE to chunk over DB link
Hi Team, Can you please have a look at below: I tried copying table data from One DB to Other over DB link...
Read more >
Merging Empty Chunks in MongoDB - Percona
If we delete a lot of data, perhaps a significant number of the chunks will be empty. This can be a significant issue...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found