Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Numpy / H5py types sometimes behave as expected, but depends on order

See original GitHub issue

Numpy and h5py data types are often compatible, however there are nuances. It seems like the typing for == will produce mask arrays in the case of the input being a scalar and a h5py dataset. However if the operands are swapped == seems to return an unexpected result. Below are tests to show this behavior in a self-contained tests.py.

To assist reproducing bugs, please include the following:

Linux 5.10.41-1-MANJARO SMP PREEMPT Fri May 28 19:10:32 UTC 2021 x86_64 GNU/Linux
platform linux – Python 3.9.5, pytest-6.2.4,
Where Python was acquired: pacman / pip
h5py version 3.1.0
hdf5 1.12.0-2
See the output of tests.py below

h5py.version.info contains the needed versions, which can be displayed by

Summary of the h5py configuration
---------------------------------

h5py    3.1.0
HDF5    1.12.0
Python  3.9.5 (default, May 24 2021, 12:50:35) 
[GCC 11.1.0]
sys.platform    linux
sys.maxsize     9223372036854775807
numpy   1.21.0
cython (built with) 0.29.21
numpy (built against) 1.19.3
HDF5 (built against) 1.12.0

# coding: utf-8
import h5py
import numpy as np
import logging

fname = 'mydatafile.h5'

def create_dataset():
    f = h5py.File(fname, 'w')
    dset = f.create_dataset("test", (100,1))
    f.close()

def test_convert_to_numpy_test_commutative():
    '''
    Conversion of h5py datasets to numpy array returns correct result
    '''
    print('%s %s'%('h5py.__version__',h5py.__version__))
    create_dataset()

    data = h5py.File(fname)
    dets = data['test']
    udets = np.unique(dets)

    type(dets)
    dets = np.asarray(dets)
    for i in udets:
        assert np.sum(i==dets) == np.sum(dets==i)


    type(dets)

def test_commutative_h5py():
    '''
    h5py dataset has suprising commutative asymmetry when not converted to numpy types explicitly
    '''
    print('%s %s'%('h5py.__version__',h5py.__version__))
    data = h5py.File(fname)
    dets = data['test']
    udets = np.unique(dets)

    for i in udets:
        assert np.sum(i == dets) ==  np.sum(dets == i)

    for i in udets:
        assert np.sum(i==dets) == np.sum(i==dets)

pytests -q tests.py
.F                                                                                                             [100%]
====================================================== FAILURES ======================================================
_______________________________________________ test_commutative_h5py ________________________________________________

    def test_commutative_h5py():
        '''
        h5py dataset has suprising commutative asymmetry when not converted to numpy types explicitly
        '''
        print('%s %s'%('h5py.__version__',h5py.__version__))
        data = h5py.File(fname)
        dets = data['test']
        udets = np.unique(dets)
    
        for i in udets:
>           assert np.sum(i == dets) ==  np.sum(dets == i)
E           assert 100 == 0
E            +  where 100 = <function sum at 0x7fd745275b80>(0.0 == <HDF5 dataset "test": shape (100, 1), type "<f4">)
E            +    where <function sum at 0x7fd745275b80> = np.sum
E            +  and   0 = <function sum at 0x7fd745275b80>(<HDF5 dataset "test": shape (100, 1), type "<f4"> == 0.0)
E            +    where <function sum at 0x7fd745275b80> = np.sum

tests.py:42: AssertionError
------------------------------------------------ Captured stdout call ------------------------------------------------
h5py.__version__ 3.1.0
============================================== short test summary info ===============================================
FAILED tests.py::test_commutative_h5py - assert 100 == 0

Issue Analytics

State:
Created 2 years ago
Comments:14 (14 by maintainers)

Top GitHub Comments

1reaction

rtpavlovsk21commented, Aug 26, 2021

Yes, I am putting it together with the tests.

1reaction

aragilarcommented, Aug 23, 2021

I think returning NotImplemented would be better. I’m wondering if there was a reason for the duck typing (I doubt it would work with pytables for example), or if it was habit?

Looking at the code, we should probably also drop __nonzero__ (given we don’t support python 2 any more).

Interestingly __ne__ is implemented in a similar way to __eq__, and seems to assume that __eq__ will always return a bool. Unless we need the locking on __ne__ explicitly (I think the locking on __eq__ would be sufficient?), I think using the default handling as per https://docs.python.org/3.9/reference/datamodel.html#object.__ne__ would be fine.