Trying out additional example files
See original GitHub issueThis is really neat, and I’m excited to try things out with some additional HDF files!
I realize the goal is to flesh out the specification and this is not a general conversion tool yet, but it seems like working with more HDF files out in the wild might bring things to light.
Some initial questions/suggestions:
- How to write out the .zchunkstore? seems things are setup currently to just output logging info
- Add chunk info output to logger (maybe dtype and MB too?)
lggr.debug(f'_ARRAY_CHUNKS = {h5obj.chunks}')
- Could be useful to first check input file is valid H5. this is easy for a local file, not sure about remote:
if not h5py.is_hdf5(f):
raise ValueError('Not an hdf5 file')
What isn’t supported? https://github.com/intake/fsspec-reference-maker/blob/bf41138add53b0201e583aa40840cd4fa5fb907b/fsspec_reference_maker/hdf.py#L103-L106
The first file I tried to generate .zchunkstore with ran into the above, code and traceback below:
def ATL06_remote():
return hdf2zarr.run(
's3://its-live-data.jpl.nasa.gov/icesat2/alt06/rel003/ATL06_20181230162257_00340206_003_01.h5',
mode='rb', anon=False, requester_pays=True,
default_fill_cache=False, default_cache_type='none'
)
DEBUG:h5-to-zarr:translator:Group: /gt1l/land_ice_segments
DEBUG:h5-to-zarr:translator:Dataset: /gt1l/land_ice_segments/atl06_quality_summary
Traceback (most recent call last):
File "h5py/h5o.pyx", line 302, in h5py.h5o.cb_obj_simple
File "/Users/scott/miniconda3/envs/fsspec-ref/lib/python3.8/site-packages/h5py/_hl/group.py", line 591, in proxy
return func(name, self[name])
File "/Users/scott/GitHub/fsspec-reference-maker/fsspec_reference_maker/hdf.py", line 105, in translator
raise RuntimeError(
RuntimeError: /gt1l/land_ice_segments/atl06_quality_summary uses unsupported HDF5 filters
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "./test.py", line 45, in <module>
ATL06_remote()
File "./test.py", line 15, in ATL06_remote
return hdf2zarr.run(
File "/Users/scott/GitHub/fsspec-reference-maker/fsspec_reference_maker/hdf.py", line 273, in run
return h5chunks.translate()
File "/Users/scott/GitHub/fsspec-reference-maker/fsspec_reference_maker/hdf.py", line 54, in translate
self._h5f.visititems(self.translator)
File "/Users/scott/miniconda3/envs/fsspec-ref/lib/python3.8/site-packages/h5py/_hl/group.py", line 592, in visititems
return h5o.visit(self.id, proxy)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
SystemError: <built-in function visit> returned a result with an error set
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (2 by maintainers)
Top Results From Across the Web
Makefile Tutorial By Example
Try it out! This next example doesn't do anything new, but is nontheless a good additional example. It will always run both targets,...
Read more >Import data from a folder with multiple files (Power Query)
To use a different file for the example file, select it from the Sample File drop-down list. Optionally, at the bottom, select Skip...
Read more >GNU make
GNU make. This file documents the GNU make utility, which determines automatically which pieces of a large program need to be recompiled, ...
Read more >Best practices for writing Dockerfiles - Docker Documentation
Best practices for writing Dockerfiles. This document covers recommended best practices and methods for building efficient images.
Read more >Set up your import files - HubSpot Knowledge Base
Learn how to set up your import file with sample spreadsheets and ... Whether HubSpot is your first CRM or you're moving from...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Just wanted to point people here towards an optimized read-only approach to work with the icesat2 HDF5 data described in this issue http://icesat2sliderule.org/h5coro , would be interesting to compare against the fsspec-reference-maker
Yes, Fletcher32 is a checksum HDF5 filter. It is used to catch any read errors from HDF5 dataset chunks. When using the Fletcher32 filter, a checksum is calculated on every chunk write operation and stored with the chunk.