question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Hangs while saving netcdf file opened using xr.open_mfdataset with lock=None

See original GitHub issue

I am testing out code that uses xarray to process netcdf files, in particular to join multiple netcdf files into one along shared dimensions. This was working well, except sometimes when saving the netcdf file the process would hang.

I was able to whittle it down to this simple example: https://github.com/jessicaaustin/xarray_netcdf_hanging_issue

This is the code snippet at the core of the example:

 # If you set lock=False then this runs fine every time.
 # Setting lock=None causes it to intermittently hang on mfd.to_netcdf
 with xr.open_mfdataset(['dataset.nc'], combine='by_coords', lock=None) as mfd:
     p = os.path.join('tmp', 'xarray_{}.nc'.format(uuid.uuid4().hex))
     print(f"Writing data to {p}")
     mfd.to_netcdf(p)
     print("complete")

If you run this once, it’s typically fine. But run it over and over again in a loop, and it’ll eventually hang on mfd.to_netcdf. However if I set lock=False then it runs fine every time.

I’ve seen this with the following combos:

  • xarray=0.14.1
  • dask=2.9.1
  • netcdf4=1.5.3

and

  • xarray=0.15.1
  • dask=2.14.0
  • netcdf4=1.5.3

And I’ve tried it with different netcdf files and different computers.

Versions

Output of `xr.show_versions()`

INSTALLED VERSIONS

commit: None python: 3.7.6 | packaged by conda-forge | (default, Mar 23 2020, 23:03:20) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.15.0-20-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.4

xarray: 0.15.1 pandas: 1.0.3 numpy: 1.18.1 scipy: None netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.1.1.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.14.0 distributed: 2.14.0 matplotlib: None cartopy: None seaborn: None numbagg: None setuptools: 46.1.3.post20200325 pip: 20.0.2 conda: None pytest: None IPython: None sphinx: None

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:4
  • Comments:12 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
fmaussioncommented, Dec 22, 2020

Just adding my +1 here, and also mention that (if memory allows), ds.load() also helps. (related: https://github.com/pydata/xarray/issues/4710)

0reactions
bekatdcommented, Feb 15, 2021

Please make some dummy tests, I did time.sleep, prior every operation. This was the only workaround that really worked.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Developers - Hangs while saving netcdf file opened using xr ...
I am testing out code that uses xarray to process netcdf files, in particular to join multiple netcdf files into one along shared...
Read more >
Performance improvement of xarray.open_mfdataset() to ...
nc files. I need to extract wind data for one single grid point, and need to make this process as quick as possible....
Read more >
Handling NetCDF Files using XArray for Absolute Beginners
Quick guide to manipulate NetCDF data. ... You can also export DataArray or DataSet to NetCDF file by dataDIR = '../data/new.nc'
Read more >
Using grib2 files with `open_mfdataset`: is there a better ...
I'm working on a project using a particular dataset where each time step is stored ... (3) do some subsetting, and (4) save...
Read more >
Exploring netCDF Datasets Using the xarray Package
Those notebooks focus on using the netcdf4-python package to read netCDF datasets from ... We could have opened the same dataset from a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found