How to prevent Zarr from returning NaN for missing chunks?
See original GitHub issueIs there a way of preventing Zarr from returning NaNs if a chunk is missing?
Background of my question: We’re seeing problems with either copying data to GCS or with GCS having problems to reliably serve all chunks of a Zarr store.
In arr
below, there’s two types of NaN filled chunks returned by Zarr.
from dask import array as darr
import numpy as np
arr = darr.from_zarr(""gs://pangeo-data/eNATL60-BLBT02X-ssh/sossheig/")
First, there’s a chunk that is completely flagged missing in the data (chunk is over land in an Ocean dataset) but present on GCS (https://console.cloud.google.com/storage/browser/_details/pangeo-data/eNATL60-BLBT02X-ssh/sossheig/0.0.0) and Zarr correctly find all items marked as invalid:
np.isnan(arr.blocks[0, 0, 0]).mean().compute()
# -> 1.0
Then, there’s a chunk (https://console.cloud.google.com/storage/browser/_details/pangeo-data/eNATL60-BLBT02X-ssh/sossheig/0.7.3) that is not present (at the time of writing this, I get a “load failed” and a tracking id from GCS) and Zarr returns all items marked invalid as well:
np.isnan(arr.blocks[0, 7, 3]).mean().compute()
# -> 1.0
How do I make Zarr raise an Exception on the latter?
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:18 (13 by maintainers)
Did you see https://github.com/zarr-developers/zarr-python/pull/489#issuecomment-823656711, @delgadom? Perhaps give that some testing to help drive it forward?
(was fixed in https://github.com/intake/filesystem_spec/pull/259 )