question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Default cast type of H5T_NATIVE_B8 bitfields (bool, int8, uint8)

See original GitHub issue

As briefly discussed in #821, reading datasets which contain bitfields (H5T_NATIVE_B8) is not possible unless you explicitly state the dtype:

import h5py
h5_file = h5py.File('BigDataFile.h5', 'r')

dset = h5_file.get('/Base/GroupA')
dt = [('Time', '<f8'),
        ('SubsetA', [('DataA1', '?'), ('DataA2', '<f8')])]
with dset.astype(dt):
    data = dset[:]

whereas datasets without bitfields can be accessed as:

import h5py
h5_file = h5py.File('BigDataFile.h5', 'r')
dset = h5_file.get('/Base/GroupA')
data = dset[:]

Attempting to read a compound dataset in this way produces an error if it has at least one bitfield: TypeError: No NumPy equivalent for TypeBitfieldID exists

Some of the suggestions from #821 were to automatically convert to bool, uint8 or int8. Imho I think it should convert to something to allow reading without explicitly stating the datatype, enabling most users to use the second code snippet.

If there was going to be a default, uint8 or int8 would achieve this and also not have any data loss compared to bool, although the original motivation for using bool in the pull request was PyTables interop. I’d suggest having uint8 - then it would work out of the box without requiring explicit casts, not have any data loss, and users who want it as something else can still do an explicit cast.

All comments are welcome 😃 I may be missing some understanding of h5py, so feel free to correct if there’s anything wrong!

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
takluyvercommented, Jul 3, 2019

I’d say if we’re going to do anything by default, uint8 is the obvious choice. If we do that, I’d also map the other bitfield types to their corresponding uint types for consistency.

Do we lose anything by mapping two HDF5 types to the same numpy dtype? Is there a convenient way for the user to check the HDF5 type of a dataset if dset.dtype is ambiguous? What if they want to create a dataset with a bitfield type?

Another idea, what if there were a context manager to modify the type mappings, so instead of having to define the entire compound dtype for a dataset, you could do something like:

with dset.maptypes(Bitfield8=np.uint8):
    data = dset[:]
0reactions
samuel-emryscommented, Oct 29, 2020

@takluyver I like your idea of using the context manager to specify the exact source and destination datatypes when casting. I’m running in to this issue at the moment; all of the datasets I have are compound data types containing float64 and bools where the bools are represented as bitfields. I originally thought that the changes in 2.10 (https://github.com/h5py/h5py/pull/821) might have solved my problems, but unfortunately the suggested method of reading tries to cast all data types within the dataset as the specified type, and since there is no defined conversion method from float to uint8 or bool, it fails.

with dset.astype(numpy.uint8):   # or numpy.bool
    arr = dset[:]

It looks like this issue most closely matches the problems I’m facing, so I thought I’d add a comment to throw some weight/support to this issue. Can anybody advise if it has made it to the development backlog/scheduled for a specific release?

To contribute to the design discussion - I suspect that there wouldn’t be a way to maintain injectivity in the conversion since my understanding is that there isn’t a unique numpy data type that maps to a bitfield. If any other data type is mapped to uint8, then there will ambiguity so converting back to the bitfield type wouldn’t be possible. From my perspective, a default cast to either np.bool or np.uint8 would satisfy my requirements, though it sounds like there’s a reasonable case to prefer uint8. As mentioned previously though, an ability to just specify what cast I want to perform when reading, i.e., “I want all bitfields to be cast to uint8” would be good too; something with more granularity than “Cast everything in this dataset to this datatype”

Read more comments on GitHub >

github_iconTop Results From Across the Web

Turning a bitfield into a uint8_t - Stack Overflow
I need a way to pack each field into a single-byte datatype like uint8_t so that I can push it to the stack....
Read more >
Bit-field - cppreference.com
Declares a class data member with explicit size, in bits. Adjacent bit-field members may (or may not) be packed to share and straddle...
Read more >
Can I assign bit fields to uint8_t?
I'd like to create a type of uint8_t with the bit fields of the 8 bit touch byte values so I can create...
Read more >
Bit Fields in C - GeeksforGeeks
It is an integer type that determines the bit-field value which is to be interpreted. The type may be int, signed int, or...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found