question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Trying out additional example files

See original GitHub issue

This is really neat, and I’m excited to try things out with some additional HDF files!

I realize the goal is to flesh out the specification and this is not a general conversion tool yet, but it seems like working with more HDF files out in the wild might bring things to light.

Some initial questions/suggestions:

  1. How to write out the .zchunkstore? seems things are setup currently to just output logging info
  2. Add chunk info output to logger (maybe dtype and MB too?) lggr.debug(f'_ARRAY_CHUNKS = {h5obj.chunks}')
  3. Could be useful to first check input file is valid H5. this is easy for a local file, not sure about remote:
        if not h5py.is_hdf5(f):
            raise ValueError('Not an hdf5 file')

What isn’t supported? https://github.com/intake/fsspec-reference-maker/blob/bf41138add53b0201e583aa40840cd4fa5fb907b/fsspec_reference_maker/hdf.py#L103-L106

The first file I tried to generate .zchunkstore with ran into the above, code and traceback below:

def ATL06_remote():
    return hdf2zarr.run(
        's3://its-live-data.jpl.nasa.gov/icesat2/alt06/rel003/ATL06_20181230162257_00340206_003_01.h5',
        mode='rb', anon=False, requester_pays=True,
        default_fill_cache=False, default_cache_type='none'
    )
DEBUG:h5-to-zarr:translator:Group: /gt1l/land_ice_segments
DEBUG:h5-to-zarr:translator:Dataset: /gt1l/land_ice_segments/atl06_quality_summary
Traceback (most recent call last):
  File "h5py/h5o.pyx", line 302, in h5py.h5o.cb_obj_simple
  File "/Users/scott/miniconda3/envs/fsspec-ref/lib/python3.8/site-packages/h5py/_hl/group.py", line 591, in proxy
    return func(name, self[name])
  File "/Users/scott/GitHub/fsspec-reference-maker/fsspec_reference_maker/hdf.py", line 105, in translator
    raise RuntimeError(
RuntimeError: /gt1l/land_ice_segments/atl06_quality_summary uses unsupported HDF5 filters

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "./test.py", line 45, in <module>
    ATL06_remote()
  File "./test.py", line 15, in ATL06_remote
    return hdf2zarr.run(
  File "/Users/scott/GitHub/fsspec-reference-maker/fsspec_reference_maker/hdf.py", line 273, in run
    return h5chunks.translate()
  File "/Users/scott/GitHub/fsspec-reference-maker/fsspec_reference_maker/hdf.py", line 54, in translate
    self._h5f.visititems(self.translator)
  File "/Users/scott/miniconda3/envs/fsspec-ref/lib/python3.8/site-packages/h5py/_hl/group.py", line 592, in visititems
    return h5o.visit(self.id, proxy)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
SystemError: <built-in function visit> returned a result with an error set

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
scottyhqcommented, May 6, 2021

Just wanted to point people here towards an optimized read-only approach to work with the icesat2 HDF5 data described in this issue http://icesat2sliderule.org/h5coro , would be interesting to compare against the fsspec-reference-maker

0reactions
ajelenakcommented, Nov 20, 2020

but I don’t know what fletcher32 is (a checksum?).

Yes, Fletcher32 is a checksum HDF5 filter. It is used to catch any read errors from HDF5 dataset chunks. When using the Fletcher32 filter, a checksum is calculated on every chunk write operation and stored with the chunk.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Makefile Tutorial By Example
Try it out! This next example doesn't do anything new, but is nontheless a good additional example. It will always run both targets,...
Read more >
Import data from a folder with multiple files (Power Query)
To use a different file for the example file, select it from the Sample File drop-down list. Optionally, at the bottom, select Skip...
Read more >
GNU make
GNU make. This file documents the GNU make utility, which determines automatically which pieces of a large program need to be recompiled, ...
Read more >
Best practices for writing Dockerfiles - Docker Documentation
Best practices for writing Dockerfiles. This document covers recommended best practices and methods for building efficient images.
Read more >
Set up your import files - HubSpot Knowledge Base
Learn how to set up your import file with sample spreadsheets and ... Whether HubSpot is your first CRM or you're moving from...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found