Error when writing string coordinate variables to zarr
See original GitHub issueI saved an xarray dataset to zarr using to_zarr
. I then later tried to read that dataset from the original zarr, re-chunk it, and then write to a new zarr. When I did that I get a strange error. I attached a zip of minimal version of the zarr dataset that I am using.
test_sm_zarr.zip
MCVE Code Sample
import xarray as xr
sm_from_zarr = xr.open_zarr('test_sm_zarr')
sm_from_zarr.to_zarr('test_sm_zarr_from', mode='w')
Expected Output
No error
Problem Description
I get this error:
C:\Users\jsadler\AppData\Local\Continuum\anaconda3\envs\nwm\lib\site-packages\xarray\core\merge.py:18: FutureWarning: The Panel class is removed from pandas. Accessing it from the top-level namespace will also be removed in the next version
PANDAS_TYPES = (pd.Series, pd.DataFrame, pd.Panel)
C:\Users\jsadler\AppData\Local\Continuum\anaconda3\envs\nwm\lib\site-packages\xarray\core\dataarray.py:1829: FutureWarning: The Panel class is removed from pandas. Accessing it from the top-level namespace will also be removed in the next version
'DataArray', pd.Series, pd.DataFrame, pd.Panel]:
Traceback (most recent call last):
File "rechunk_test.py", line 38, in <module>
sm_from_zarr.to_zarr('test_sm_zarr_from', mode='w')
File "C:\Users\jsadler\AppData\Local\Continuum\anaconda3\envs\nwm\lib\site-packages\xarray\core\dataset.py", line 1414, in to_zarr
consolidated=consolidated, append_dim=append_dim)
File "C:\Users\jsadler\AppData\Local\Continuum\anaconda3\envs\nwm\lib\site-packages\xarray\backends\api.py", line 1101, in to_zarr
dump_to_store(dataset, zstore, writer, encoding=encoding)
File "C:\Users\jsadler\AppData\Local\Continuum\anaconda3\envs\nwm\lib\site-packages\xarray\backends\api.py", line 929, in dump_to_store
unlimited_dims=unlimited_dims)
File "C:\Users\jsadler\AppData\Local\Continuum\anaconda3\envs\nwm\lib\site-packages\xarray\backends\zarr.py", line 366, in store
unlimited_dims=unlimited_dims)
File "C:\Users\jsadler\AppData\Local\Continuum\anaconda3\envs\nwm\lib\site-packages\xarray\backends\zarr.py", line 432, in set_variables
writer.add(v.data, zarr_array)
File "C:\Users\jsadler\AppData\Local\Continuum\anaconda3\envs\nwm\lib\site-packages\xarray\backends\common.py", line 173, in add
target[...] = source
File "C:\Users\jsadler\AppData\Local\Continuum\anaconda3\envs\nwm\lib\site-packages\zarr\core.py", line 1115, in __setitem__
self.set_basic_selection(selection, value, fields=fields)
File "C:\Users\jsadler\AppData\Local\Continuum\anaconda3\envs\nwm\lib\site-packages\zarr\core.py", line 1210, in set_basic_selection
return self._set_basic_selection_nd(selection, value, fields=fields)
File "C:\Users\jsadler\AppData\Local\Continuum\anaconda3\envs\nwm\lib\site-packages\zarr\core.py", line 1501, in _set_basic_selection_nd
self._set_selection(indexer, value, fields=fields)
File "C:\Users\jsadler\AppData\Local\Continuum\anaconda3\envs\nwm\lib\site-packages\zarr\core.py", line 1550, in _set_selection
self._chunk_setitem(chunk_coords, chunk_selection, chunk_value, fields=fields)
File "C:\Users\jsadler\AppData\Local\Continuum\anaconda3\envs\nwm\lib\site-packages\zarr\core.py", line 1659, in _chunk_setitem
fields=fields)
File "C:\Users\jsadler\AppData\Local\Continuum\anaconda3\envs\nwm\lib\site-packages\zarr\core.py", line 1723, in _chunk_setitem_nosync
cdata = self._encode_chunk(chunk)
File "C:\Users\jsadler\AppData\Local\Continuum\anaconda3\envs\nwm\lib\site-packages\zarr\core.py", line 1769, in _encode_chunk
chunk = f.encode(chunk)
File "numcodecs/vlen.pyx", line 108, in numcodecs.vlen.VLenUTF8.encode
TypeError: expected unicode string, found 20
BUT
I think it has something to do with the datatype of one of my coordinates, site_code
. Because, if it do this I get no error:
import xarray as xr
sm_from_zarr = xr.open_zarr('test_sm_zarr')
sm_from_zarr['site_code'] = sm_from_zarr.site_code.astype('str')
sm_from_zarr.to_zarr('test_sm_zarr_from', mode='w')
Before converting the datatype of the site_code
coordinate is object
Output of xr.show_versions()
xarray: 0.12.2 pandas: 0.25.1 numpy: 1.17.1 scipy: 1.3.1 netCDF4: 1.5.1.2 pydap: installed h5netcdf: None h5py: None Nio: None zarr: 2.3.2 cftime: 1.0.3.4 nc_time_axis: None PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.3.0 distributed: 2.5.1 matplotlib: 3.1.1 cartopy: None seaborn: None numbagg: None setuptools: 41.2.0 pip: 19.2.3 conda: None pytest: 5.1.2 IPython: 7.8.0 sphinx: None
Issue Analytics
- State:
- Created 4 years ago
- Reactions:5
- Comments:13 (7 by maintainers)
Top GitHub Comments
Hi, I also keep running into this issue all the time. Right now, there is no way of round-tripping
xr.open_zarr().to_zarr()
, also because of https://github.com/pydata/xarray/issues/5219.The only workaround that seems to help is the following:
I’m experiencing the same issue, which seems to be also related to one of my coordinates having object as datatype. Luckily, the workaround proposed by @jsadler2 works in my case, too.