trouble loading netcdf4 files with xarray on s3
See original GitHub issueI’m working on allowing direct access to netcdf4/hdf5 file-like objects (https://github.com/pydata/xarray/pull/2782). This seems to be working fine with gcsfs, but not s3fs (versions 0.2 from conda-forge). Here is a gist with the relevant code and error traceback:
https://gist.github.com/scottyhq/304a3c4b4e198776b8d82fb3a9f300e3
and an abbreviated traceback here:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/Documents/GitHub/xarray/xarray/backends/file_manager.py in acquire(self, needs_lock)
166 try:
--> 167 file = self._cache[self._key]
168 except KeyError:
~/Documents/GitHub/xarray/xarray/backends/lru_cache.py in __getitem__(self, key)
40 with self._lock:
---> 41 value = self._cache[key]
42 self._cache.move_to_end(key)
KeyError: [<function _open_h5netcdf_group at 0x11d8b0ae8>, (<S3File grfn-content-prod/S1-GUNW-A-R-137-tops-20181129_20181123-020010-43220N_41518N-PP-e2c7-v2_0_0.nc>,), 'r', (('group', '/science/grids/data'),)]
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
h5py/h5fd.pyx in h5py.h5fd.H5FD_fileobj_read()
~/miniconda3/envs/test_env/lib/python3.6/site-packages/s3fs/core.py in readinto(self, b)
1498 data = self.read()
-> 1499 b[:len(data)] = data
1500 return len(data)
~/miniconda3/envs/test_env/lib/python3.6/site-packages/h5py/h5fd.cpython-36m-darwin.so in View.MemoryView.memoryview.__setitem__()
~/miniconda3/envs/test_env/lib/python3.6/site-packages/h5py/h5fd.cpython-36m-darwin.so in View.MemoryView.memoryview.setitem_slice_assignment()
~/miniconda3/envs/test_env/lib/python3.6/site-packages/h5py/h5fd.cpython-36m-darwin.so in View.MemoryView.memoryview_copy_contents()
~/miniconda3/envs/test_env/lib/python3.6/site-packages/h5py/h5fd.cpython-36m-darwin.so in View.MemoryView._err_extents()
ValueError: got differing extents in dimension 0 (got 8 and 59941567)
The above exception was the direct cause of the following exception:
SystemError Traceback (most recent call last)
h5py/h5fd.pyx in h5py.h5fd.H5FD_fileobj_read()
~/miniconda3/envs/test_env/lib/python3.6/site-packages/s3fs/core.py in seek(self, loc, whence)
1235 """
-> 1236 if not self.readable():
1237 raise ValueError('Seek only available in read mode')
SystemError: PyEval_EvalFrameEx returned a result with an error set
any guidance as to what might be going on here would be appreciated!
Issue Analytics
- State:
- Created 5 years ago
- Comments:36 (19 by maintainers)
Top Results From Across the Web
trouble loading netcdf4 files with xarray on s3 #168 - GitHub
I'm working on allowing direct access to netcdf4/hdf5 file-like objects (pydata/xarray#2782). This seems to be working fine with gcsfs, ...
Read more >Error trying to open NetCDF file with xarray from s3 bucket
I'm trying to open a .nc file from an S3 bucket using xarray, but I'm getting an error. Here's the method I'm using:...
Read more >how to read S3 files in lambda using Xarray? - Stack Overflow
I am trying to read netCDF via Xarray and convert it to csv. Boto3 doesn`t work for reading netCDF4 and converting it to...
Read more >Reading and writing files - Xarray
NetCDF groups are not supported as part of the Dataset data model. Instead, groups can be loaded individually as Dataset objects.
Read more >Using Kerchunk to read NetCDF4 data on AWS S3 as Zarr for ...
from kerchunk.combine import MultiZarrToZarrimport xarray as xr ... First, we need to list the files on S3 that we want to generate.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
(I suppose this is why you want to encode all the options required for smooth working of a particular dataset into a catalog…)
I don’t know the internals of h5netcdf, but i would hope it’s a range. You could time reading a whole array versus reading a single value; but it will not be linear, due to fixed costs of each connection and metadata lookups. For a slice, it would depend on exact layout and chunking. You may want to turn on s3fs debug logging.
On May 12, 2019 8:35:48 PM EDT, Paul Branson notifications@github.com wrote:
– Sent from my Android device with K-9 Mail. Please excuse my brevity.