Numpy / H5py types sometimes behave as expected, but depends on order
See original GitHub issueNumpy and h5py data types are often compatible, however there are nuances. It seems like the typing for ==
will produce mask arrays in the case of the input being a scalar and a h5py dataset. However if the operands are swapped ==
seems to return an unexpected result. Below are tests to show this behavior in a self-contained tests.py.
To assist reproducing bugs, please include the following:
- Linux 5.10.41-1-MANJARO SMP PREEMPT Fri May 28 19:10:32 UTC 2021 x86_64 GNU/Linux
- platform linux – Python 3.9.5, pytest-6.2.4,
- Where Python was acquired: pacman / pip
- h5py version 3.1.0
- hdf5 1.12.0-2
- See the output of tests.py below
h5py.version.info
contains the needed versions, which can be displayed by
Summary of the h5py configuration
---------------------------------
h5py 3.1.0
HDF5 1.12.0
Python 3.9.5 (default, May 24 2021, 12:50:35)
[GCC 11.1.0]
sys.platform linux
sys.maxsize 9223372036854775807
numpy 1.21.0
cython (built with) 0.29.21
numpy (built against) 1.19.3
HDF5 (built against) 1.12.0
# coding: utf-8
import h5py
import numpy as np
import logging
fname = 'mydatafile.h5'
def create_dataset():
f = h5py.File(fname, 'w')
dset = f.create_dataset("test", (100,1))
f.close()
def test_convert_to_numpy_test_commutative():
'''
Conversion of h5py datasets to numpy array returns correct result
'''
print('%s %s'%('h5py.__version__',h5py.__version__))
create_dataset()
data = h5py.File(fname)
dets = data['test']
udets = np.unique(dets)
type(dets)
dets = np.asarray(dets)
for i in udets:
assert np.sum(i==dets) == np.sum(dets==i)
type(dets)
def test_commutative_h5py():
'''
h5py dataset has suprising commutative asymmetry when not converted to numpy types explicitly
'''
print('%s %s'%('h5py.__version__',h5py.__version__))
data = h5py.File(fname)
dets = data['test']
udets = np.unique(dets)
for i in udets:
assert np.sum(i == dets) == np.sum(dets == i)
for i in udets:
assert np.sum(i==dets) == np.sum(i==dets)
pytests -q tests.py
.F [100%]
====================================================== FAILURES ======================================================
_______________________________________________ test_commutative_h5py ________________________________________________
def test_commutative_h5py():
'''
h5py dataset has suprising commutative asymmetry when not converted to numpy types explicitly
'''
print('%s %s'%('h5py.__version__',h5py.__version__))
data = h5py.File(fname)
dets = data['test']
udets = np.unique(dets)
for i in udets:
> assert np.sum(i == dets) == np.sum(dets == i)
E assert 100 == 0
E + where 100 = <function sum at 0x7fd745275b80>(0.0 == <HDF5 dataset "test": shape (100, 1), type "<f4">)
E + where <function sum at 0x7fd745275b80> = np.sum
E + and 0 = <function sum at 0x7fd745275b80>(<HDF5 dataset "test": shape (100, 1), type "<f4"> == 0.0)
E + where <function sum at 0x7fd745275b80> = np.sum
tests.py:42: AssertionError
------------------------------------------------ Captured stdout call ------------------------------------------------
h5py.__version__ 3.1.0
============================================== short test summary info ===============================================
FAILED tests.py::test_commutative_h5py - assert 100 == 0
Issue Analytics
- State:
- Created 2 years ago
- Comments:14 (14 by maintainers)
Top Results From Across the Web
h5py==2.8.0 floats cast to python.float or numpy.float64 non ...
However, this leaves h5py behavior dependent on an implementation details of numpy (the order in which they populate the dtype dictionary) ...
Read more >latest PDF - h5py Documentation
The object we obtained isn't an array, but an HDF5 dataset. Like NumPy arrays, datasets have both a shape and a data type:...
Read more >How to use HDF5 files in Python
If you print the type of data and of data_set , you will see that they are actually different. The first is a...
Read more >Setting the inner shape of a numpy arrays (for h5py)
I would like to know how to set the shape of an inner numpy array when initializing the shape of a dataset for...
Read more >Array creation — NumPy v1.24 Manual
The default NumPy behavior is to create arrays in either 32 or 64-bit signed ... If you expect your integer arrays to be...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Yes, I am putting it together with the tests.
I think returning
NotImplemented
would be better. I’m wondering if there was a reason for the duck typing (I doubt it would work with pytables for example), or if it was habit?Looking at the code, we should probably also drop
__nonzero__
(given we don’t support python 2 any more).Interestingly
__ne__
is implemented in a similar way to__eq__
, and seems to assume that__eq__
will always return a bool. Unless we need the locking on__ne__
explicitly (I think the locking on__eq__
would be sufficient?), I think using the default handling as per https://docs.python.org/3.9/reference/datamodel.html#object.__ne__ would be fine.