question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Writing GDAL ZARR _CRS attribute not possible

See original GitHub issue

What is your issue?

Related to https://github.com/pydata/xarray/issues/6374

Writing a ZARR which is compatible with GDAL conventions using xarray.Dataset.to_zarr requires all the data variables to have a _CRS attribute which contains the Spatial Reference System encoding (SRS). This _CRS attribute itself is a dict in which the SRS is encoded in at least one of these keys: wkt, url, projjson

Because attribute values can’t be dictionaries during serialization, it does not seem possible to write GDAL compatible zarrs using xarray.

Example:

lets assume we have a Dataset ds like this:

<xarray.Dataset>
Dimensions:  (Y: 180, X: 360)
Coordinates:
  * X        (X) float64 -179.5 -178.5 -177.5 -176.5 ... 176.5 177.5 178.5 179.5
  * Y        (Y) float64 89.5 88.5 87.5 86.5 85.5 ... -86.5 -87.5 -88.5 -89.5
Data variables:
    Band1    (Y, X) uint16 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0
    Band2    (Y, X) uint16 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0
    Band3    (Y, X) uint16 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0

lets also assume we want to encode the _CRS as wkt like so:

wkt = 'GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]'

(encoding the _CRS in any of the other 2 formats results in the same problem at the end)

Setting the attributes of each data variable:

attributes = {
        "_ARRAY_DIMENSIONS": ['Y', 'X'],
         "_CRS": {"wkt": wkt},
        "AREA_OR_POINT": 'Area',
    }

for data_var in ds.data_vars:
    ds[data_var].attrs = attributes

no problem so far, ds.Band1.attrs results in:

{
    "_ARRAY_DIMENSIONS": ["Y", "X"],
    "_CRS": {
        "wkt": 'GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]'
    },
    "AREA_OR_POINT": "Area",
}

the problem now occurs with writing the dataset using:

ds.to_zarr("test.zarr", consolidated=True)
TypeError: Invalid value for attr '_CRS': {'wkt': 'GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]'}. 

For serialization to netCDF files, its value must be of one of the following types: str, Number, ndarray, number, list, tuple

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:12 (8 by maintainers)

github_iconTop GitHub Comments

6reactions
rabernatcommented, Apr 6, 2022

I think the core problem here is that Zarr itself supports arbitrary json data structures as attributes, but netCDF does not. The Zarr serialization in Xarray is designed to emulate netCDF, but we could make that optional, for example, with a flag to bypass attribute encoding / decoding and just pass the python data directly through to Zarr.

However, my concern would be that netCDF4 C library would not be able to read those files (nczarr). What happens if you try to open up a GDAL-created Zarr with netCDF4?

FWIW, the new GeoZarr Spec by @christophenoel does not use the GDAL convention for CRS. Instead, it recommends to use CF conventions for encoding CRS. This is more compatible with NetCDF, but won’t be parsed correctly by GDAL.

I am a little discouraged that we have not managed to align better across projects so far (e.g. having this conversation before the GDAL Zarr CRS convention was implemented). 😞 For example, either of these two GDAL PRs:

However, it is not too late! Let’s try to reach for a standard way of encoding CRS in Zarr that can be used across languages and implementations of Zarr.

My own preference would be to try to get GDAL to support the GeoZarr Spec and thus the CF-convention CRS attribute, rather than trying to get Xarray to be able to write the GDAL CRS convention.

2reactions
rabernatcommented, Apr 15, 2022

I am guilty of sidetracking this issue into the politics of CRS encoding. That discussion is important. But in the meantime, @wankoelias’s original issue reveals is narrower technical issue with Xarray’s Zarr writer: Xarray won’t let you serialize a dictionary attribute to zarr, even though zarr has no problem with this. That is a problem we can fix pretty easily.

The _validate_attrs helper function was just borrowed from to_netcdf:

https://github.com/pydata/xarray/blob/586992e8d2998751cb97b1cab4d3caa9dca116e0/xarray/backends/api.py#L133-L135

We could refactor this function to be more flexible to account for zarr’s broader range of allowed attribute types (as we have evidently already done for h5netcdf). Or we could just bypass it completely in the to_zarr method. That is the only real decision we need to make here right now.

@wankoelias - you seem to understand the issue pretty well. Would you be game for making a PR? We would be glad to support you along the way.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Zarr — GDAL documentation
GDAL uses a _CRS attribute that is a dictionary that may contain one or several of the following keys: url (using a OGC...
Read more >
VRT – GDAL Virtual Format — GDAL documentation
A resampling attribute can be specified on a SimpleSource or ComplexSource element to specified the resampling algorithm used when the size of the...
Read more >
NetCDF: Network Common Data Form - Raster drivers - GDAL
The crs_wkt CF metatata attribute will be used instead. WRITE_LONLAT=[YES/NO/IF_NEEDED]: Define if CF lon/lat variables are written to file. Default is YES for ......
Read more >
GDAL Documentation
GDAL is a translator library for raster and vector geospatial data formats that is released under an MIT style Open.
Read more >
GTiff – GeoTIFF File Format — GDAL documentation
GDAL also supports reading and writing BigTIFF files (evolution of the TIFF ... If no georeferencing information is available in the TIFF file...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found