question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to prevent Zarr from returning NaN for missing chunks?

See original GitHub issue

Is there a way of preventing Zarr from returning NaNs if a chunk is missing?

Background of my question: We’re seeing problems with either copying data to GCS or with GCS having problems to reliably serve all chunks of a Zarr store.

In arr below, there’s two types of NaN filled chunks returned by Zarr.

from dask import array as darr
import numpy as np

arr = darr.from_zarr(""gs://pangeo-data/eNATL60-BLBT02X-ssh/sossheig/")

First, there’s a chunk that is completely flagged missing in the data (chunk is over land in an Ocean dataset) but present on GCS (https://console.cloud.google.com/storage/browser/_details/pangeo-data/eNATL60-BLBT02X-ssh/sossheig/0.0.0) and Zarr correctly find all items marked as invalid:

np.isnan(arr.blocks[0, 0, 0]).mean().compute()
# -> 1.0

Then, there’s a chunk (https://console.cloud.google.com/storage/browser/_details/pangeo-data/eNATL60-BLBT02X-ssh/sossheig/0.7.3) that is not present (at the time of writing this, I get a “load failed” and a tracking id from GCS) and Zarr returns all items marked invalid as well:

np.isnan(arr.blocks[0, 7, 3]).mean().compute()
# -> 1.0

How do I make Zarr raise an Exception on the latter?

cc: @auraoupa related: pangeo-data/pangeo#691

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:1
  • Comments:18 (13 by maintainers)

github_iconTop GitHub Comments

1reaction
joshmoorecommented, Apr 21, 2021

Did you see https://github.com/zarr-developers/zarr-python/pull/489#issuecomment-823656711, @delgadom? Perhaps give that some testing to help drive it forward?

1reaction
martindurantcommented, Apr 1, 2020
Read more comments on GitHub >

github_iconTop Results From Across the Web

Dask array to zarr with unknown shapes - python
Zarr expects that chunk shapes are uniform and known beforehand. Dask facilitates this currently by rechunking the array to be uniform.
Read more >
Storage (zarr.storage) — zarr 2.13.3 documentation
To avoid creating duplicate entries, only write data once, and align writes with chunk boundaries. This alignment is done automatically if you call...
Read more >
HRRR Zarr Data Loading Guide - MesoWest
Contains just the data for the chunk (no metadata); Current APIs (zarr, ... logic since missing values are NaN (a float) and the...
Read more >
Indexing and selecting data - Xarray
Indexing a DataArray directly works (mostly) just like it does for numpy arrays, except that the returned object is always another DataArray:.
Read more >
Working with missing data
Integer dtypes and missing data#. Because NaN is a float, a column of integers with even one missing values is cast to floating-point...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found