question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

jupyter repr caching deleted netcdf file

See original GitHub issue

What happened:

Testing xarray data storage in a jupyter notebook with varying data sizes and storing to a netcdf, i noticed that open_dataset/array (both show this behaviour) continue to return data from the first testing run, ignoring the fact that each run deletes the previously created netcdf file. This only happens once the repr was used to display the xarray object. But once in error mode, even the previously fine printed objects are then showing the wrong data.

This was hard to track down as it depends on the precise sequence in jupyter.

What you expected to happen:

when i use open_dataset/array, the resulting object should reflect reality on disk.

Minimal Complete Verifiable Example:

import xarray as xr
from pathlib import Path
import numpy as np

def test_repr(nx):
    ds = xr.DataArray(np.random.rand(nx))
    path = Path("saved_on_disk.nc")
    if path.exists():
        path.unlink()
    ds.to_netcdf(path)
    return path

When executed in a cell with print for display, all is fine:

test_repr(4)
print(xr.open_dataset("saved_on_disk.nc"))
test_repr(5)
print(xr.open_dataset("saved_on_disk.nc"))

but as soon as one cell used the jupyter repr:

xr.open_dataset("saved_on_disk.nc")

all future file reads, even after executing the test function again and even using print and not repr, show the data from the last repr use.

Anything else we need to know?:

Here’s a notebook showing the issue: https://gist.github.com/05c2542ed33662cdcb6024815cc0c72c

Environment:

Output of <tt>xr.show_versions()</tt>

INSTALLED VERSIONS

commit: None python: 3.7.6 | packaged by conda-forge | (default, Jun 1 2020, 18:57:50) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.4.0-40-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4

xarray: 0.16.0 pandas: 1.0.5 numpy: 1.19.0 scipy: 1.5.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.5 cfgrib: None iris: None bottleneck: None dask: 2.21.0 distributed: 2.21.0 matplotlib: 3.3.0 cartopy: 0.18.0 seaborn: 0.10.1 numbagg: None pint: None setuptools: 49.2.0.post20200712 pip: 20.1.1 conda: installed pytest: 6.0.0rc1 IPython: 7.16.1 sphinx: 3.1.2

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:9 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
mullenkampcommented, Oct 4, 2022

Running xarray.backends.file_manager.FILE_CACHE.clear() fixed the issue for me. I couldn’t find any other way to stop xarray from pulling up some old data from a newly saved file. I’m using the h5netcdf engine with xarray version 2022.6.0 by the way.

1reaction
shoyercommented, Jul 25, 2020

Probably the easiest work around is to call .close() on the original dataset. Failing that, the file is cached in xarray.backends.file_manager.FILE_CACHE, which you could muck around with.

I believe it only gets activated by repr() because array values from netCDF file are loaded lazily. Not 100% without more testing, though.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to recover Jupyter Notebook file? - python - Stack Overflow
Depend on the cache setting of your web browser, cache should contain your last edited jupyter notebook. Suggest you duplicate cache files with ......
Read more >
What's New - Xarray
Delete files of datasets saved to disk while building the documentation and enable building on Windows via sphinx-build (PR6237). By Stan West.
Read more >
ucar.nc2.util.cache.FileCache.acquireCacheOnly java code ...
Try to find a file in the cache. Popular methods of FileCache. <init>. Constructor. acquire. Acquire a FileCacheable, and lock it so ...
Read more >
Release Notes | TDS Administrator's Guide
Deprecated classes and methods have been removed, and the module ... the netCDF-Java project no longer provide Java Web Start files as of...
Read more >
netCDF4 API documentation
This module can read and write files in both the new netCDF 4 and the old ... Attributes can be deleted from a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found