Writing GDAL ZARR _CRS attribute not possible
See original GitHub issueWhat is your issue?
Related to https://github.com/pydata/xarray/issues/6374
Writing a ZARR which is compatible with GDAL conventions using xarray.Dataset.to_zarr
requires all the data variables to have a _CRS
attribute which contains the Spatial Reference System encoding (SRS).
This _CRS
attribute itself is a dict
in which the SRS is encoded in at least one of these keys: wkt
, url
, projjson
Because attribute values can’t be dictionaries during serialization, it does not seem possible to write GDAL compatible zarrs using xarray.
Example:
lets assume we have a Dataset ds
like this:
<xarray.Dataset>
Dimensions: (Y: 180, X: 360)
Coordinates:
* X (X) float64 -179.5 -178.5 -177.5 -176.5 ... 176.5 177.5 178.5 179.5
* Y (Y) float64 89.5 88.5 87.5 86.5 85.5 ... -86.5 -87.5 -88.5 -89.5
Data variables:
Band1 (Y, X) uint16 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0
Band2 (Y, X) uint16 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0
Band3 (Y, X) uint16 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0
lets also assume we want to encode the _CRS
as wkt
like so:
wkt = 'GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]'
(encoding the _CRS in any of the other 2 formats results in the same problem at the end)
Setting the attributes of each data variable:
attributes = {
"_ARRAY_DIMENSIONS": ['Y', 'X'],
"_CRS": {"wkt": wkt},
"AREA_OR_POINT": 'Area',
}
for data_var in ds.data_vars:
ds[data_var].attrs = attributes
no problem so far, ds.Band1.attrs
results in:
{
"_ARRAY_DIMENSIONS": ["Y", "X"],
"_CRS": {
"wkt": 'GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]'
},
"AREA_OR_POINT": "Area",
}
the problem now occurs with writing the dataset using:
ds.to_zarr("test.zarr", consolidated=True)
TypeError: Invalid value for attr '_CRS': {'wkt': 'GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]'}.
For serialization to netCDF files, its value must be of one of the following types: str, Number, ndarray, number, list, tuple
Issue Analytics
- State:
- Created a year ago
- Comments:12 (8 by maintainers)
I think the core problem here is that Zarr itself supports arbitrary json data structures as attributes, but netCDF does not. The Zarr serialization in Xarray is designed to emulate netCDF, but we could make that optional, for example, with a flag to bypass attribute encoding / decoding and just pass the python data directly through to Zarr.
However, my concern would be that netCDF4 C library would not be able to read those files (nczarr). What happens if you try to open up a GDAL-created Zarr with netCDF4?
FWIW, the new GeoZarr Spec by @christophenoel does not use the GDAL convention for CRS. Instead, it recommends to use CF conventions for encoding CRS. This is more compatible with NetCDF, but won’t be parsed correctly by GDAL.
I am a little discouraged that we have not managed to align better across projects so far (e.g. having this conversation before the GDAL Zarr CRS convention was implemented). 😞 For example, either of these two GDAL PRs:
However, it is not too late! Let’s try to reach for a standard way of encoding CRS in Zarr that can be used across languages and implementations of Zarr.
My own preference would be to try to get GDAL to support the GeoZarr Spec and thus the CF-convention CRS attribute, rather than trying to get Xarray to be able to write the GDAL CRS convention.
I am guilty of sidetracking this issue into the politics of CRS encoding. That discussion is important. But in the meantime, @wankoelias’s original issue reveals is narrower technical issue with Xarray’s Zarr writer: Xarray won’t let you serialize a dictionary attribute to zarr, even though zarr has no problem with this. That is a problem we can fix pretty easily.
The
_validate_attrs
helper function was just borrowed fromto_netcdf
:https://github.com/pydata/xarray/blob/586992e8d2998751cb97b1cab4d3caa9dca116e0/xarray/backends/api.py#L133-L135
We could refactor this function to be more flexible to account for zarr’s broader range of allowed attribute types (as we have evidently already done for h5netcdf). Or we could just bypass it completely in the
to_zarr
method. That is the only real decision we need to make here right now.@wankoelias - you seem to understand the issue pretty well. Would you be game for making a PR? We would be glad to support you along the way.