numpy lacks memory and speed efficiency for Booleans
See original GitHub issueUsing pure Boolean values is pretty limiting when dealing with very large data due to memory waste. np.array will use 1 byte per Boolean which although better than 4 or 8, is still 8 times waste.
Easy enough to pack the Boolean value np.array as bytes (np.uint8):
def boolarr_tobytes(arr):
rem = len(arr) % 8
if rem != 0: arr = np.concatenate((arr, np.zeros(8 - rem, dtype=np.bool_)))
arr = np.reshape(arr, (int(len(arr) / 8), 8))
return np.packbits(arr) #translates boolean to bytes if array shape (n, 8) with high bits first
Then &
and |
provide bitwise operations already. And np.sum can be written as:
bytebitcounts = np.array([bin(x).count("1") for x in range(256)])
def totalbits_bytearr(arr):
return np.sum(bytebitcounts[arr])
Now I am truly supposing that the table lookup which uses the imaging table translation function is vectorized properly. I would imagine since it is used heavily for image processing that it is. This would be 2 vector operations (np.sum
and table lookup) instead of 1 np.sum
. PSHUFB
(packed bytes shuffle) is the name of the processor intrinsic which can do byte table lookup translation. However, since the AVX/SSE2 and like instructions have data limits, 8 times less vector operations would occur per vector operation. 1 vector operation * 8 vs 2 vector operations is still 4 times faster.
So if numpy would dare to add a whole data type to use packed byte representations of Booleans instead (which might be a major change which would need to be implemented), it would decrease memory by 8 times, increase vector operations by 8 times except where bit twiddling like mentioned is needed where it would depending on the specific operation still tend to be faster.
I can see no reason why this would not be highly desirable for the library especially since large datasets are pretty typical.
Yes the primary problem would be endless indexing oddities (1 << bitoffset) & value
, along with if set: value |= (1 << bitoffset)
. But a lot of things are already implicitly supported.
Half of the operations are probably trivial like multiplication, addition as shown, and a few would require some real thinking.
It would make these Python libraries as flexibly scalable as C though so it would be impressive. This would further positively effect a great many libraries out there giving dramatic potential increases in large data sets.
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (3 by maintainers)
I think you have the wrong model of how numba works, like I used to. I think the things you need to know about the numba are that:
clang
Numpy cannot take a dependency on numba, it would make everything far too cyclic.
They’re implemented using native loops in C. The problem with
np.sum(bytebitcounts[arr])
is it uses two loops, an intermediate array, and no compile-time knowledge of the lookup table.Note that you can probably get 90% of the performance you want with
numba
: