question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cannot open dataset with empty list units

See original GitHub issue

What happened?

I found myself using a netcdf with empty units and by using xarray i was unable to use open_dataset due to the parsing of cf conventions. I reproduce the bug, and it happens in a particular situation when the units is an empty list (See Minimal Complete Verifiable Example)

What did you expect to happen?

To parse the units attribute as an empty string ?

Minimal Complete Verifiable Example

temp = 15 + 8 * np.random.randn(2, 2, 3)
precip = 10 * np.random.rand(2, 2, 3)
lon = [[-99.83, -99.32], [-99.79, -99.23]]
lat = [[42.25, 42.21], [42.63, 42.59]]

# for real use cases, its good practice to supply array attributes such as
# units, but we won't bother here for the sake of brevity
ds = xr.Dataset(
        {
            "temperature": (["x", "y", "time"], temp),
            "precipitation": (["x", "y", "time"], precip),
        },
        coords={
            "lon": (["x", "y"], lon),
            "lat": (["x", "y"], lat),
            "time": pd.date_range("2014-09-06", periods=3),
            "reference_time": pd.Timestamp("2014-09-05"),
        },
    )
ds.temperature.attrs["units"] = []

ds.to_netcdf("test.nc")

ds = xr.open_dataset("test.nc")
ds.close()

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [3], in <cell line: 1>()
----> 1 ds = xr.open_dataset("test.nc")
      2 print(ds["temperature"].attrs)
      3 ds.close()

File ~/.local/src/miniconda/envs/uptodatexarray/lib/python3.10/site-packages/xarray/backends/api.py:495, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs)
    483 decoders = _resolve_decoders_kwargs(
    484     decode_cf,
    485     open_backend_dataset_parameters=backend.open_dataset_parameters,
   (...)
    491     decode_coords=decode_coords,
    492 )
    494 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 495 backend_ds = backend.open_dataset(
    496     filename_or_obj,
    497     drop_variables=drop_variables,
    498     **decoders,
    499     **kwargs,
    500 )
    501 ds = _dataset_from_backend_dataset(
    502     backend_ds,
    503     filename_or_obj,
   (...)
    510     **kwargs,
    511 )
    512 return ds

File ~/.local/src/miniconda/envs/uptodatexarray/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:564, in NetCDF4BackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, format, clobber, diskless, persist, lock, autoclose)
    562 store_entrypoint = StoreBackendEntrypoint()
    563 with close_on_error(store):
--> 564     ds = store_entrypoint.open_dataset(
    565         store,
    566         mask_and_scale=mask_and_scale,
    567         decode_times=decode_times,
    568         concat_characters=concat_characters,
    569         decode_coords=decode_coords,
    570         drop_variables=drop_variables,
    571         use_cftime=use_cftime,
    572         decode_timedelta=decode_timedelta,
    573     )
    574 return ds

File ~/.local/src/miniconda/envs/uptodatexarray/lib/python3.10/site-packages/xarray/backends/store.py:27, in StoreBackendEntrypoint.open_dataset(self, store, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta)
     24 vars, attrs = store.load()
     25 encoding = store.get_encoding()
---> 27 vars, attrs, coord_names = conventions.decode_cf_variables(
     28     vars,
     29     attrs,
     30     mask_and_scale=mask_and_scale,
     31     decode_times=decode_times,
     32     concat_characters=concat_characters,
     33     decode_coords=decode_coords,
     34     drop_variables=drop_variables,
     35     use_cftime=use_cftime,
     36     decode_timedelta=decode_timedelta,
     37 )
     39 ds = Dataset(vars, attrs=attrs)
     40 ds = ds.set_coords(coord_names.intersection(vars))

File ~/.local/src/miniconda/envs/uptodatexarray/lib/python3.10/site-packages/xarray/conventions.py:503, in decode_cf_variables(variables, attributes, concat_characters, mask_and_scale, decode_times, decode_coords, drop_variables, use_cftime, decode_timedelta)
    499     continue
    500 stack_char_dim = (
    501     concat_characters and v.dtype == "S1" and v.ndim > 0 and stackable(v.dims[-1])
    502 )
--> 503 new_vars[k] = decode_cf_variable(
    504     k,
    505     v,
    506     concat_characters=concat_characters,
    507     mask_and_scale=mask_and_scale,
    508     decode_times=decode_times,
    509     stack_char_dim=stack_char_dim,
    510     use_cftime=use_cftime,
    511     decode_timedelta=decode_timedelta,
    512 )
    513 if decode_coords in [True, "coordinates", "all"]:
    514     var_attrs = new_vars[k].attrs

File ~/.local/src/miniconda/envs/uptodatexarray/lib/python3.10/site-packages/xarray/conventions.py:354, in decode_cf_variable(name, var, concat_characters, mask_and_scale, decode_times, decode_endianness, stack_char_dim, use_cftime, decode_timedelta)
    351         var = coder.decode(var, name=name)
    353 if decode_timedelta:
--> 354     var = times.CFTimedeltaCoder().decode(var, name=name)
    355 if decode_times:
    356     var = times.CFDatetimeCoder(use_cftime=use_cftime).decode(var, name=name)

File ~/.local/src/miniconda/envs/uptodatexarray/lib/python3.10/site-packages/xarray/coding/times.py:537, in CFTimedeltaCoder.decode(self, variable, name)
    534 def decode(self, variable, name=None):
    535     dims, data, attrs, encoding = unpack_for_decoding(variable)
--> 537     if "units" in attrs and attrs["units"] in TIME_UNITS:
    538         units = pop_to(attrs, encoding, "units")
    539         transform = partial(decode_cf_timedelta, units=units)

TypeError: unhashable type: 'numpy.ndarray'

Anything else we need to know?

The following assignation produces the bug :

ds.temperature.attrs["units"] = []

But these ones does not produce the bug :

ds.temperature.attrs["units"] = "[]"
ds.temperature.attrs["units"] = ""

Also, i don’t know how the units attributes get encoded for writing but i see no difference between ds.temperature.attrs["units"] = "" and ds.temperature.attrs["units"] = [] when using ncdump on the file

Environment

This bug was encountered with versions below this one.

INSTALLED VERSIONS

commit: None python: 3.10.4 (main, Mar 31 2022, 08:41:55) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.13.0-52-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: fr_FR.UTF-8 LOCALE: (‘fr_FR’, ‘UTF-8’) libhdf5: 1.10.6 libnetcdf: 4.6.1

xarray: 0.20.1 pandas: 1.4.3 numpy: 1.22.3 scipy: None netCDF4: 1.5.7 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.5 dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None setuptools: 61.2.0 pip: 22.1.2 conda: None pytest: None IPython: 8.4.0 sphinx: None

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
antscloudcommented, Jul 13, 2022

@antscloud As a workaround you could use keyword argument decode_cf=False in the call to xr.open_dataset. After fixing the units attribute to some reasonable value you can call ds = xr.decode_cf(ds).

Thank you, i’ll do this. One could just loop over variables attributes and replace [] by an empty string in this particular case

1reaction
kmuehlbauercommented, Jul 13, 2022

@antscloud As a workaround you could use keyword argument decode_cf=False in the call to xr.open_dataset. After fixing the units attribute to some reasonable value you can call ds = xr.decode_cf(ds).

Read more comments on GitHub >

github_iconTop Results From Across the Web

IMS 15 - IMS messages - DFS0730I - IBM
DFS0730I UNABLE TO OPEN OR CLOSE DATASET WITH DDNAME ddname FOR REASON x, yy, ... 03: Fewer units than volumes were specified for...
Read more >
'InvalidArgumentError: Input is empty' error on dataset created ...
I am trying to create training, validation, and testing datasets using 'tf.keras.utils.image_dataset_from_directory' ...
Read more >
3dUndump: FATAL ERROR -master: can't open dataset - AFNI
Hi there, I'm having an issue running 3dUndump -- when I attempt to run: 3dUndump -prefix RAntCingulate -master anat_MNI+tlrc -srad 5 -xyz ...
Read more >
Troubleshoot designer component errors - Azure Machine ...
Open the component that returned the error, and check the input dataset and component properties. Verify that the input dataset is not empty...
Read more >
Reading and writing files - Xarray
The units encoding should be a string like 'days since ... Xarray can't open just any zarr dataset, because xarray requires special metadata ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found