Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Grib2 scan_grib: Dimensions are Out of Order

See original GitHub issue

I’ve been attempting to follow this kerchunk grib2 example using HRRR data, but I have encountered an interesting issue that prevents me from being able to use the Dataset in my intended manner due to the mis-naming (or mis-ordering) of X and Y dimensions for arrays.

Running the example_combine from the documentation (copied below) and printing the dataset shows the issue: the arrays are labeled as (time, x, y), but the actual ordering of the grib2 data is (time, y, x). 1799 is the X extent of the domain and 1059 is the Y extent of the domain, yet in the dimensions, they are labeled opposite. Attempting to perform operations on the Dataset when working with X and Y coordinate names doesn’t work because of this behavior.

I’ve attempted manually modifying the kerchunk/grib2.py file to re-order occurrences of ["x", "y"], but sadly the solution doesn’t appear to be that simple. Any ideas on how to make sure that the array dimensions are labeled correctly?

Script Output

<xarray.Dataset>
Dimensions:            (time: 1, x: 1059, y: 1799, heightAboveGround: 1)
Coordinates:
  * heightAboveGround  (heightAboveGround) int64 2
  * time               (time) datetime64[ns] 2019-01-01T22:00:00
Dimensions without coordinates: x, y
Data variables:
    2d                 (time, x, y) float64 dask.array<chunksize=(1, 1059, 1799), meta=np.ndarray>
    2r                 (time, x, y) float64 dask.array<chunksize=(1, 1059, 1799), meta=np.ndarray>
    2sh                (time, x, y) float64 dask.array<chunksize=(1, 1059, 1799), meta=np.ndarray>
    2t                 (time, x, y) float64 dask.array<chunksize=(1, 1059, 1799), meta=np.ndarray>
    latitude           (x, y) float64 dask.array<chunksize=(1059, 1799), meta=np.ndarray>
    longitude          (x, y) float64 dask.array<chunksize=(1059, 1799), meta=np.ndarray>
    pt                 (time, x, y) float64 dask.array<chunksize=(1, 1059, 1799), meta=np.ndarray>
Attributes:
    centre:             kwbc
    centreDescription:  US National Weather Service - NCEP
    edition:            2
    subCentre:          0

Example Script

import xarray as xr
import fsspec
from kerchunk.grib2 import scan_grib
def example_combine(
    filter={"typeOfLevel": "heightAboveGround", "level": 2}
):  # pragma: no cover
    """Create combined dataset of weather measurements at 2m height

    Ten consecutive timepoints from ten 120MB files on s3.
    Example usage:

    >>> tot = example_combine()
    >>> ds = xr.open_dataset("reference://", engine="zarr", backend_kwargs={
    ...        "consolidated": False,
    ...        "storage_options": {"fo": tot, "remote_options": {"anon": True}}})
    """
    from kerchunk.combine import MultiZarrToZarr, drop

    files = [
        "s3://noaa-hrrr-bdp-pds/hrrr.20190101/conus/hrrr.t22z.wrfsfcf01.grib2",

    ]
    so = {"anon": True, "default_cache_type": "readahead"}

    out = [scan_grib(u, storage_options=so, filter=filter) for u in files]
    out = sum(out, [])
    mzz = MultiZarrToZarr(
        out,
        remote_protocol="s3",
        preprocess=drop(("valid_time", "step")),
        remote_options=so,
        concat_dims=["time", "var"],
        identical_dims=["heightAboveGround", "latitude", "longitude"],
    )
    return mzz.translate()

def main():
    d = example_combine()

    fs = fsspec.filesystem("reference", fo=d, remote_protocol='s3', remote_options={'anon':True})
    m = fs.get_mapper("")
    ds = xr.open_dataset(m, engine="zarr", backend_kwargs=dict(consolidated=False),
                      chunks={'valid_time':1})
    print(ds)

if __name__ == "__main__":
    main()

Issue Analytics

State:
Created 9 months ago
Comments:14 (8 by maintainers)

Top GitHub Comments

1reaction

keltonhalbertcommented, Dec 12, 2022

I’ll chime in and agree that most (if not all) geospatial grib data I have worked with are in C-style, row-major ordered layout (y, x), (lat, lon), (time, lat, lon), (time, level, lat, lon), etc… I am sure there are specific edge cases, but at least for NOAA/NWS data, this is the norm.

0reactions

mpiannuccicommented, Dec 13, 2022

That looks right to me!!