question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Grib2 scan_grib: Dimensions are Out of Order

See original GitHub issue

I’ve been attempting to follow this kerchunk grib2 example using HRRR data, but I have encountered an interesting issue that prevents me from being able to use the Dataset in my intended manner due to the mis-naming (or mis-ordering) of X and Y dimensions for arrays.

Running the example_combine from the documentation (copied below) and printing the dataset shows the issue: the arrays are labeled as (time, x, y), but the actual ordering of the grib2 data is (time, y, x). 1799 is the X extent of the domain and 1059 is the Y extent of the domain, yet in the dimensions, they are labeled opposite. Attempting to perform operations on the Dataset when working with X and Y coordinate names doesn’t work because of this behavior.

I’ve attempted manually modifying the kerchunk/grib2.py file to re-order occurrences of ["x", "y"], but sadly the solution doesn’t appear to be that simple. Any ideas on how to make sure that the array dimensions are labeled correctly?

Script Output

<xarray.Dataset>
Dimensions:            (time: 1, x: 1059, y: 1799, heightAboveGround: 1)
Coordinates:
  * heightAboveGround  (heightAboveGround) int64 2
  * time               (time) datetime64[ns] 2019-01-01T22:00:00
Dimensions without coordinates: x, y
Data variables:
    2d                 (time, x, y) float64 dask.array<chunksize=(1, 1059, 1799), meta=np.ndarray>
    2r                 (time, x, y) float64 dask.array<chunksize=(1, 1059, 1799), meta=np.ndarray>
    2sh                (time, x, y) float64 dask.array<chunksize=(1, 1059, 1799), meta=np.ndarray>
    2t                 (time, x, y) float64 dask.array<chunksize=(1, 1059, 1799), meta=np.ndarray>
    latitude           (x, y) float64 dask.array<chunksize=(1059, 1799), meta=np.ndarray>
    longitude          (x, y) float64 dask.array<chunksize=(1059, 1799), meta=np.ndarray>
    pt                 (time, x, y) float64 dask.array<chunksize=(1, 1059, 1799), meta=np.ndarray>
Attributes:
    centre:             kwbc
    centreDescription:  US National Weather Service - NCEP
    edition:            2
    subCentre:          0

Example Script

import xarray as xr
import fsspec
from kerchunk.grib2 import scan_grib
def example_combine(
    filter={"typeOfLevel": "heightAboveGround", "level": 2}
):  # pragma: no cover
    """Create combined dataset of weather measurements at 2m height

    Ten consecutive timepoints from ten 120MB files on s3.
    Example usage:

    >>> tot = example_combine()
    >>> ds = xr.open_dataset("reference://", engine="zarr", backend_kwargs={
    ...        "consolidated": False,
    ...        "storage_options": {"fo": tot, "remote_options": {"anon": True}}})
    """
    from kerchunk.combine import MultiZarrToZarr, drop

    files = [
        "s3://noaa-hrrr-bdp-pds/hrrr.20190101/conus/hrrr.t22z.wrfsfcf01.grib2",

    ]
    so = {"anon": True, "default_cache_type": "readahead"}

    out = [scan_grib(u, storage_options=so, filter=filter) for u in files]
    out = sum(out, [])
    mzz = MultiZarrToZarr(
        out,
        remote_protocol="s3",
        preprocess=drop(("valid_time", "step")),
        remote_options=so,
        concat_dims=["time", "var"],
        identical_dims=["heightAboveGround", "latitude", "longitude"],
    )
    return mzz.translate()

def main():
    d = example_combine()

    fs = fsspec.filesystem("reference", fo=d, remote_protocol='s3', remote_options={'anon':True})
    m = fs.get_mapper("")
    ds = xr.open_dataset(m, engine="zarr", backend_kwargs=dict(consolidated=False),
                      chunks={'valid_time':1})
    print(ds)

if __name__ == "__main__":
    main()

Issue Analytics

  • State:open
  • Created 9 months ago
  • Comments:14 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
keltonhalbertcommented, Dec 12, 2022

I’ll chime in and agree that most (if not all) geospatial grib data I have worked with are in C-style, row-major ordered layout (y, x), (lat, lon), (time, lat, lon), (time, level, lat, lon), etc… I am sure there are specific edge cases, but at least for NOAA/NWS data, this is the norm.

0reactions
mpiannuccicommented, Dec 13, 2022

That looks right to me!!

Read more comments on GitHub >

github_iconTop Results From Across the Web

GRIB2 Encoding Details for the NDFD - Graphical Forecast
GRIB, Edition 2 (GRIB2) will be involved in the NDFD in two ways. ... where the "nnnnnnnnnn" is the size in bytes of...
Read more >
NCEP GRIB2 Documentation - NCEP Central Operations
Table 3.9 - Numbering Order of Diamonds ... Table 3.21 - Vertical Dimension Coordinate Values Definition.
Read more >
GRIB tools examples - ECMWF Confluence Wiki
data/tigge_pf_ecmwf.grib2 out.grib ... orig.grib out.grib. To copy selected fields and apply sorting (sorted by level in ascending order).
Read more >
Handling GRIB in GrADS - COLA/GMU
GRIB2 is similar to GRIB, but has a more complex set of header fields for the metadata, and also offers data compression that...
Read more >
GRIB1, GRIB2, NetCDF: Which do I choose?
06 or later, then choose GRIB2 because of its smaller size. For earlier dates, use GRIB1. If you are using software that reads...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found