Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can't take max of arrays at least as large as 2 ** 32

See original GitHub issue

Describe the bug Calling sparse.COO.max on an array larger than 2 ** 32 - 1 fails a TypeError like so:

>>> a.shape
(4294967296,)
>>> a.max()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\<path_redacted>\sparse\_sparse_array.py", line 444, in max
    return np.maximum.reduce(self, out=out, axis=axis, keepdims=keepdims)
  File "C:\<path_redacted>\sparse\_sparse_array.py", line 307, in __array_ufunc__
    result = SparseArray._reduce(ufunc, *inputs, **kwargs)
  File "C:\<path_redacted>\sparse\_sparse_array.py", line 278, in _reduce
    return self.reduce(method, **kwargs)
  File "C:\<path_redacted>\sparse\_sparse_array.py", line 360, in reduce
    out = self._reduce_calc(method, axis, keepdims, **kwargs)
  File "C:\<path_redacted>\sparse\_coo\core.py", line 692, in _reduce_calc
    data, inv_idx, counts = _grouped_reduce(a.data, a.coords[0], method, **kwargs)
  File "C:\<path_redacted>\sparse\_coo\core.py", line 1566, in _grouped_reduce
    result = method.reduceat(x, inv_idx, **kwargs)
TypeError: Cannot cast array data from dtype('uint64') to dtype('int64') according to the rule 'safe'

To Reproduce Create an array a at least as large as 2 ** 32 with at least one nonzero element, then call a.max(). For example:

>>> b = sparse.DOK((2 ** 32,))
>>> b[0] = 1
>>> a = sparse.COO(b)
>>> a.nnz
1
>>> a.max() # TypeError

Expected behavior Return the maximum value of the array (1 in the example above).

System

OS and version: Windows 10
sparse version: 0.12.0+44.g765e297 (bug is also present in 0.12.0, installed from pip)
NumPy version: 1.18.5
Numba version: 0.53.1

Additional context sparse.COO.max works on an array of size 2 ** 32 if it is empty (i.e. a.nnz == 0).

Issue Analytics

State:
Created 2 years ago
Comments:7

Top GitHub Comments

1reaction

hameerabbasicommented, Jul 5, 2022

Thanks @GPhilo for digging into this, I’ll try to set some time aside this weekend to fix it and cut a release.

0reactions

GPhilocommented, Jul 5, 2022

I traced the issue to its source and came up with a hack to make this work, should anyone else also run into this problem. Basically, when this reshape is called, because idx_type is ignored, as mentioned in the comment above, it uses the default int32 idx_type. Since in32 can’t store the new shape, this test checks positive and idx_type gets converted to the result of np.min_scalar_type(max(shape)), which is np.uint64 and that’s what causes the problem.

My hack to solve this is to hardcode np.int64 instead of letting numpy choose: