question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Numpy / H5py types sometimes behave as expected, but depends on order

See original GitHub issue

Numpy and h5py data types are often compatible, however there are nuances. It seems like the typing for == will produce mask arrays in the case of the input being a scalar and a h5py dataset. However if the operands are swapped == seems to return an unexpected result. Below are tests to show this behavior in a self-contained tests.py.

To assist reproducing bugs, please include the following:

  • Linux 5.10.41-1-MANJARO SMP PREEMPT Fri May 28 19:10:32 UTC 2021 x86_64 GNU/Linux
  • platform linux – Python 3.9.5, pytest-6.2.4,
  • Where Python was acquired: pacman / pip
  • h5py version 3.1.0
  • hdf5 1.12.0-2
  • See the output of tests.py below

h5py.version.info contains the needed versions, which can be displayed by

Summary of the h5py configuration
---------------------------------

h5py    3.1.0
HDF5    1.12.0
Python  3.9.5 (default, May 24 2021, 12:50:35) 
[GCC 11.1.0]
sys.platform    linux
sys.maxsize     9223372036854775807
numpy   1.21.0
cython (built with) 0.29.21
numpy (built against) 1.19.3
HDF5 (built against) 1.12.0

# coding: utf-8
import h5py
import numpy as np
import logging

fname = 'mydatafile.h5'

def create_dataset():
    f = h5py.File(fname, 'w')
    dset = f.create_dataset("test", (100,1))
    f.close()

def test_convert_to_numpy_test_commutative():
    '''
    Conversion of h5py datasets to numpy array returns correct result
    '''
    print('%s %s'%('h5py.__version__',h5py.__version__))
    create_dataset()

    data = h5py.File(fname)
    dets = data['test']
    udets = np.unique(dets)

    type(dets)
    dets = np.asarray(dets)
    for i in udets:
        assert np.sum(i==dets) == np.sum(dets==i)


    type(dets)

def test_commutative_h5py():
    '''
    h5py dataset has suprising commutative asymmetry when not converted to numpy types explicitly
    '''
    print('%s %s'%('h5py.__version__',h5py.__version__))
    data = h5py.File(fname)
    dets = data['test']
    udets = np.unique(dets)

    for i in udets:
        assert np.sum(i == dets) ==  np.sum(dets == i)

    for i in udets:
        assert np.sum(i==dets) == np.sum(i==dets)

pytests -q tests.py
.F                                                                                                             [100%]
====================================================== FAILURES ======================================================
_______________________________________________ test_commutative_h5py ________________________________________________

    def test_commutative_h5py():
        '''
        h5py dataset has suprising commutative asymmetry when not converted to numpy types explicitly
        '''
        print('%s %s'%('h5py.__version__',h5py.__version__))
        data = h5py.File(fname)
        dets = data['test']
        udets = np.unique(dets)
    
        for i in udets:
>           assert np.sum(i == dets) ==  np.sum(dets == i)
E           assert 100 == 0
E            +  where 100 = <function sum at 0x7fd745275b80>(0.0 == <HDF5 dataset "test": shape (100, 1), type "<f4">)
E            +    where <function sum at 0x7fd745275b80> = np.sum
E            +  and   0 = <function sum at 0x7fd745275b80>(<HDF5 dataset "test": shape (100, 1), type "<f4"> == 0.0)
E            +    where <function sum at 0x7fd745275b80> = np.sum

tests.py:42: AssertionError
------------------------------------------------ Captured stdout call ------------------------------------------------
h5py.__version__ 3.1.0
============================================== short test summary info ===============================================
FAILED tests.py::test_commutative_h5py - assert 100 == 0

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:14 (14 by maintainers)

github_iconTop GitHub Comments

1reaction
rtpavlovsk21commented, Aug 26, 2021

Yes, I am putting it together with the tests.

1reaction
aragilarcommented, Aug 23, 2021

I think returning NotImplemented would be better. I’m wondering if there was a reason for the duck typing (I doubt it would work with pytables for example), or if it was habit?

Looking at the code, we should probably also drop __nonzero__ (given we don’t support python 2 any more).

Interestingly __ne__ is implemented in a similar way to __eq__, and seems to assume that __eq__ will always return a bool. Unless we need the locking on __ne__ explicitly (I think the locking on __eq__ would be sufficient?), I think using the default handling as per https://docs.python.org/3.9/reference/datamodel.html#object.__ne__ would be fine.

Read more comments on GitHub >

github_iconTop Results From Across the Web

h5py==2.8.0 floats cast to python.float or numpy.float64 non ...
However, this leaves h5py behavior dependent on an implementation details of numpy (the order in which they populate the dtype dictionary) ...
Read more >
latest PDF - h5py Documentation
The object we obtained isn't an array, but an HDF5 dataset. Like NumPy arrays, datasets have both a shape and a data type:...
Read more >
How to use HDF5 files in Python
If you print the type of data and of data_set , you will see that they are actually different. The first is a...
Read more >
Setting the inner shape of a numpy arrays (for h5py)
I would like to know how to set the shape of an inner numpy array when initializing the shape of a dataset for...
Read more >
Array creation — NumPy v1.24 Manual
The default NumPy behavior is to create arrays in either 32 or 64-bit signed ... If you expect your integer arrays to be...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found