question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

open_dataset segfaults when opening hdf5 with array of strings

See original GitHub issue

What happened?

An attempt to open non netCDF4 hdf5 file with xarray.open_dataset results in a segmentation fault.

What did you expect to happen?

Code execute and not die a horrible death.

Minimal Complete Verifiable Example

import h5py
import numpy as np
from xarray import open_dataset

filename = "foobar.h5"

with h5py.File(filename, "w") as fp:
    fp["test"] = np.array(["foo", "bar", "baz"], dtype="|S3")

with open_dataset(filename, engine="netcdf4") as ds:
    print("yay")

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

$ python3 -Wd -X faulthandler bug.py 
Fatal Python error: Segmentation fault

Current thread 0x00007f2283947740 (most recent call first):
  File "/home/xarth/codes/xarthisius/xarray/xarray/backends/netCDF4_.py", line 106 in _getitem
  File "/home/xarth/codes/xarthisius/xarray/xarray/core/indexing.py", line 816 in explicit_indexing_adapter
  File "/home/xarth/codes/xarthisius/xarray/xarray/backends/netCDF4_.py", line 93 in __getitem__
  File "/home/xarth/codes/xarthisius/xarray/xarray/core/indexing.py", line 527 in __array__
  File "/home/xarth/codes/xarthisius/xarray/xarray/core/variable.py", line 250 in _as_array_or_item
  File "/home/xarth/codes/xarthisius/xarray/xarray/core/variable.py", line 510 in values
  File "/home/xarth/codes/xarthisius/xarray/xarray/core/variable.py", line 337 in data
  File "/home/xarth/codes/xarthisius/xarray/xarray/core/common.py", line 1866 in contains_cftime_datetimes
  File "/home/xarth/codes/xarthisius/xarray/xarray/core/common.py", line 1875 in _contains_datetime_like_objects
  File "/home/xarth/codes/xarthisius/xarray/xarray/conventions.py", line 345 in decode_cf_variable
  File "/home/xarth/codes/xarthisius/xarray/xarray/conventions.py", line 521 in decode_cf_variables
  File "/home/xarth/codes/xarthisius/xarray/xarray/backends/store.py", line 27 in open_dataset
  File "/home/xarth/codes/xarthisius/xarray/xarray/backends/netCDF4_.py", line 567 in open_dataset
  File "/home/xarth/codes/xarthisius/xarray/xarray/backends/api.py", line 495 in open_dataset
  File "/home/xarth/codes/xarthisius/yt/bug.py", line 10 in <module>
Segmentation fault (core dumped)

Anything else we need to know?

This regression was introduced in #6489

Environment

INSTALLED VERSIONS

commit: None python: 3.9.5 (default, Nov 23 2021, 15:27:38) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-121-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.utf8 LANG: en_US.utf8 LOCALE: (‘en_US’, ‘UTF-8’) libhdf5: 1.12.2 libnetcdf: 4.9.0

xarray: 2022.3.1.dev60+g5f01c115 pandas: 1.4.3 numpy: 1.23.0 scipy: 1.9.0rc1 netCDF4: 1.6.0 pydap: None h5netcdf: None h5py: 3.7.0 Nio: None zarr: None cftime: 1.6.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2022.6.1 distributed: None matplotlib: 3.5.2 cartopy: None seaborn: None numbagg: None fsspec: 2022.5.0 cupy: None pint: None sparse: None setuptools: 44.0.0 pip: 20.0.2 conda: None pytest: 7.1.2 IPython: 8.4.0 sphinx: None

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:8 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
Xarthisiuscommented, Jun 27, 2022

Can you reproduce this without xarray? It should probably be reported upstream? (In the netCDF4 repo?)

Yeah, it can be triggered with:


from netCDF4 import Dataset
import h5py
import numpy as np
import operator

filename = "foobar.h5"
with h5py.File(filename, "w") as fp:
    fp["test"] = np.array(["foo", "bar", "baz"], dtype="|S3")

rootgrp = Dataset("foobar.h5", "r", format="NETCDF4")
# This works
print(rootgrp["/test"][slice(None)])

# This segfaults
operator.getitem(rootgrp["/test"], slice(None))
0reactions
dcheriancommented, Jun 27, 2022

Thanks @kmuehlbauer

Read more comments on GitHub >

github_iconTop Results From Across the Web

Loading datasets of numpy string arrays leads to error and/or ...
Numpy arrays of strings that are saved with h5py cause errors and segfaults, not always the same result. What you expected to happen:...
Read more >
Problems reading subset of HDF5 dataset in Fortran
In particular I'm having real difficulty understanding what the dataspace and memory space are, in my code they don't seem to be doing...
Read more >
Moving away from HDF5 - Cyrille Rossant
An HDF5 file contains a POSIX-like hierarchy of numerical arrays (aka ... a segmentation fault occurred with variable-length strings in the ...
Read more >
h5toh4convert - The HDF Group
Data corruption or segmentation faults may occur if the application continues. This can happen when an application was compiled by one version of...
Read more >
Chapter 5 HDF5 Datasets
In the example, the data is initialized in the memory array dset_data. The dataset has already been created in the file, so it...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found