open_dataset is not thread-safe
See original GitHub issue👋 Hi, great library! I’ve been trying to use xarray
from a flask server and have encountered frequent segfaults when trying to load a file.
MCVE Code Sample
Putting together this MCVE example took a bit of time, but it was a good exercise for me. I enjoyed learning more about the MVCE philosophy! 😊 The only caveat is that reproducibility is difficult for this kind of threading issue, so I can’t guarantee that the bug will reproduce. If you have any suggestions for improved reproducibility, please let me know what I can do!
import threading
import xarray as xr
SAVED_FILE_NAME = "saved.nc"
# Modifying these items may change the likelihood of hitting a segfault
N_ELEMENTS = 100
N_THREADS = 2
if __name__ == '__main__':
xr.Dataset({'foo': ('x', range(N_ELEMENTS))}).to_netcdf(SAVED_FILE_NAME)
threads = [
threading.Thread(target=lambda: xr.load_dataset(SAVED_FILE_NAME, engine="netcdf4"))
for _ in range(N_THREADS)
]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
print("No segfault!")
Expected Output
Program prints No segfault!
and exits successfully
Problem Description
The program sometimes segfaults. When running with the Python fault handler, I often get an output that looks like this:
Fatal Python error: Segmentation fault
Thread 0x0000700002c2f000 (most recent call first):
File "/Users/paul/projects/reflectivity-map-segfault/venv/lib/python3.7/site-packages/xarray/backends/file_manager.py", line 204 in _acquire_with_cache_info
File "/Users/paul/projects/reflectivity-map-segfault/venv/lib/python3.7/site-packages/xarray/backends/file_manager.py", line 186 in acquire_context
File "/usr/local/Cellar/python/3.7.4_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/contextlib.py", line 112 in __enter__
File "/Users/paul/projects/reflectivity-map-segfault/venv/lib/python3.7/site-packages/xarray/backends/netCDF4_.py", line 362 in _acquire
File "/Users/paul/projects/reflectivity-map-segfault/venv/lib/python3.7/site-packages/xarray/backends/file_manager.py", line 211 in _acquire_with_cache_info
File "/Users/paul/projects/reflectivity-map-segfault/venv/lib/python3.7/site-packages/xarray/backends/file_manager.py", line 192 in acquire_context
Current thread 0x0000700004a41000 (most recent call first):
File "/Users/paul/projects/reflectivity-map-segfault/venv/lib/python3.7/site-packages/xarray/backends/file_manager.py", line 204 in _acquire_with_cache_info
File "/Users/paul/projects/reflectivity-map-segfault/venv/lib/python3.7/site-packages/xarray/backends/file_manager.py", line 186 in acquire_context
File "/usrzsh: segmentation fault ./xarray-segfault
Versions
Output of <tt>xr.show_versions()</tt>
INSTALLED VERSIONS
commit: None python: 3.7.4 (default, Sep 7 2019, 18:27:02) [Clang 10.0.1 (clang-1001.0.46.4)] python-bits: 64 OS: Darwin OS-release: 19.0.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.2 libnetcdf: 4.6.3
xarray: 0.15.1 pandas: 1.0.3 numpy: 1.18.4 scipy: None netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.1.3 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None setuptools: 46.4.0 pip: 20.1.1 conda: None pytest: None IPython: None sphinx: None
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (3 by maintainers)
Take a look here: https://portal.hdfgroup.org/display/knowledge/Questions+about+thread-safety+and+concurrent+access
I haven’t actually tried compiling in thread-safe mode myself
There are also a few work-arounds you might consider in the meantime here: