question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Problems faced while storing onto Zarr store using ABSStore

See original GitHub issue
# Your code here

import zarr
from azure.storage.blob import BlockBlobService

store = zarr.ABSStore(container='zarrstoreall', prefix='zarrstoreall',account_name='xxxx',account_key='xxxx', blob_service_kwargs={'is_emulated': False})

compressor = zarr.Blosc(cname='zstd', clevel=3)
encoding = {vname: {'compressor': compressor} for vname in ds.data_vars}
ds.to_zarr(store=store, encoding=encoding, consolidated=True)

Problem description

I’m trying to use ABSStore to store a large XArray dataset onto a zarr store using blob store. (see the code in previous section). I am facing two issues currently:

  1. I am getting first some sort of network error when loading “certain” variables into the store: image

After some time passing I get this error: image

Needless to say with relatively smaller sizes of XArray datasets I did not face these issues.

I appreciate your kind attention.

Version and installation information

Please provide the following:

  • Value of zarr.__version__ = ‘2.3.2’
  • Value of numcodecs.__version__ = ‘0.6.4’
  • Version of Python interpreter = Python 3.7.3
  • Operating system (Linux/Windows/Mac) = Databricks Runtime Version 6.1 (includes Apache Spark 2.4.4, Scala 2.11)
  • How Zarr was installed (e.g., “using pip into virtual environment”, or “using conda”) !pip install zarr

Also, if you think it might be relevant, please provide the output from pip freeze or conda env export depending on which was used to install Zarr. pip freeze output: adal==1.2.2 asciitree==0.3.3 asn1crypto==0.24.0 azure==4.0.0 azure-applicationinsights==0.1.0 azure-batch==4.1.3 azure-common==1.1.23 azure-cosmosdb-nspkg==2.0.2 azure-cosmosdb-table==1.0.6 azure-datalake-store==0.0.48 azure-eventgrid==1.3.0 azure-graphrbac==0.40.0 azure-keyvault==1.1.0 azure-loganalytics==0.1.0 azure-mgmt==4.0.0 azure-mgmt-advisor==1.0.1 azure-mgmt-applicationinsights==0.1.1 azure-mgmt-authorization==0.50.0 azure-mgmt-batch==5.0.1 azure-mgmt-batchai==2.0.0 azure-mgmt-billing==0.2.0 azure-mgmt-cdn==3.1.0 azure-mgmt-cognitiveservices==3.0.0 azure-mgmt-commerce==1.0.1 azure-mgmt-compute==4.6.2 azure-mgmt-consumption==2.0.0 azure-mgmt-containerinstance==1.5.0 azure-mgmt-containerregistry==2.8.0 azure-mgmt-containerservice==4.4.0 azure-mgmt-cosmosdb==0.4.1 azure-mgmt-datafactory==0.6.0 azure-mgmt-datalake-analytics==0.6.0 azure-mgmt-datalake-nspkg==3.0.1 azure-mgmt-datalake-store==0.5.0 azure-mgmt-datamigration==1.0.0 azure-mgmt-devspaces==0.1.0 azure-mgmt-devtestlabs==2.2.0 azure-mgmt-dns==2.1.0 azure-mgmt-eventgrid==1.0.0 azure-mgmt-eventhub==2.6.0 azure-mgmt-hanaonazure==0.1.1 azure-mgmt-iotcentral==0.1.0 azure-mgmt-iothub==0.5.0 azure-mgmt-iothubprovisioningservices==0.2.0 azure-mgmt-keyvault==1.1.0 azure-mgmt-loganalytics==0.2.0 azure-mgmt-logic==3.0.0 azure-mgmt-machinelearningcompute==0.4.1 azure-mgmt-managementgroups==0.1.0 azure-mgmt-managementpartner==0.1.1 azure-mgmt-maps==0.1.0 azure-mgmt-marketplaceordering==0.1.0 azure-mgmt-media==1.0.0 azure-mgmt-monitor==0.5.2 azure-mgmt-msi==0.2.0 azure-mgmt-network==2.7.0 azure-mgmt-notificationhubs==2.1.0 azure-mgmt-nspkg==3.0.2 azure-mgmt-policyinsights==0.1.0 azure-mgmt-powerbiembedded==2.0.0 azure-mgmt-rdbms==1.9.0 azure-mgmt-recoveryservices==0.3.0 azure-mgmt-recoveryservicesbackup==0.3.0 azure-mgmt-redis==5.0.0 azure-mgmt-relay==0.1.0 azure-mgmt-reservations==0.2.1 azure-mgmt-resource==2.2.0 azure-mgmt-scheduler==2.0.0 azure-mgmt-search==2.1.0 azure-mgmt-servicebus==0.5.3 azure-mgmt-servicefabric==0.2.0 azure-mgmt-signalr==0.1.1 azure-mgmt-sql==0.9.1 azure-mgmt-storage==2.0.0 azure-mgmt-subscription==0.2.0 azure-mgmt-trafficmanager==0.50.0 azure-mgmt-web==0.35.0 azure-nspkg==3.0.2 azure-servicebus==0.21.1 azure-servicefabric==6.3.0.0 azure-servicemanagement-legacy==0.20.6 azure-storage-blob==1.5.0 azure-storage-common==1.4.2 azure-storage-file==1.4.0 azure-storage-queue==1.4.0 backcall==0.1.0 boto==2.49.0 boto3==1.9.162 botocore==1.12.163 certifi==2019.3.9 cffi==1.12.2 cftime==1.0.4.2 chardet==3.0.4 cryptography==2.6.1 cycler==0.10.0 Cython==0.29.6 dask==2.9.0 decorator==4.4.0 docutils==0.14 fasteners==0.15 fsspec==0.6.1 idna==2.8 ipython==7.4.0 ipython-genutils==0.2.0 isodate==0.6.0 jedi==0.13.3 jmespath==0.9.4 kiwisolver==1.1.0 koalas==0.23.0 locket==0.2.0 matplotlib==3.0.3 monotonic==1.5 msrest==0.6.10 msrestazure==0.6.2 netCDF4==1.5.3 numcodecs==0.6.4 numpy==1.16.2 oauthlib==3.1.0 pandas==0.24.2 parso==0.3.4 partd==1.1.0 patsy==0.5.1 pexpect==4.6.0 pickleshare==0.7.5 prompt-toolkit==2.0.9 psycopg2==2.7.6.1 ptyprocess==0.6.0 pyarrow==0.13.0 pycparser==2.19 pycurl==7.43.0 Pygments==2.3.1 pygobject==3.20.0 PyJWT==1.7.1 pyOpenSSL==19.0.0 pyparsing==2.4.2 PySocks==1.6.8 python-apt==1.1.0b1+ubuntu0.16.4.5 python-dateutil==2.8.0 pytz==2018.9 requests==2.21.0 requests-oauthlib==1.3.0 s3transfer==0.2.1 scikit-learn==0.20.3 scipy==1.2.1 seaborn==0.9.0 six==1.12.0 ssh-import-id==5.5 statsmodels==0.9.0 toolz==0.10.0 traitlets==4.3.2 unattended-upgrades==0.1 urllib3==1.24.1 virtualenv==16.4.1 wcwidth==0.1.7 xarray==0.14.1 zarr==2.3.2

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:25 (18 by maintainers)

github_iconTop GitHub Comments

2reactions
tjcronecommented, Dec 12, 2019

I believe the first error is actually a warning, and occurs when Zarr looks for metadata files that do not exist. This has been solved in newer versions of the Azure SDK. I would try upgrading azure-storage-blob to v2.1.

It’s worth noting that while investigating this I learned that there is a major new release of the Azure SDK that looks like it will break ABSStore entirely. We are going to need to figure out how to deal with this probably soon. It’s not obvious how we are going to deal with two versions of the SDK that are essentially incompatible. I will probably start a new issue to work on this eventually.

1reaction
shikharsgcommented, Mar 26, 2020

Re: getting NaN values

I think I might have found out why this happens, as I ran into this myself.

There is a fill_value attribute in zarr, which zarr uses to fill out missing chunks (see here).

Xarray uses this same attribute as the _FillValue attribute(see here) for decoding using the CF conventions, which is something quite different from filling out missing chunks.

@zarr-developers/core-devs Is this a correct interpretation? If so where should this be fixed? In xarray or in zarr?

@dokooh I fixed this temporarily by giving mask_and_scale=False to xr.open_zarr

Read more comments on GitHub >

github_iconTop Results From Across the Web

Storage (zarr.storage) — zarr 2.13.3 documentation
The DirectoryStore class stores all chunk files for an array together in a single directory. On some file systems, the potentially large number...
Read more >
Zarr Python 2.3 release
One issue with using cloud object storage is that, although total I/O throughput can be high, the latency involved in each request to...
Read more >
Storing large OME-Zarr files: File numbers, sharding & best ...
Storing this on traditional file systems will run into scaling issues ... Yes, we're working actively on the sharding support in Zarr v3....
Read more >
Zarr Data Overview - Pangeo
Zarr storage format¶. Each data store in the CMIP6 collection consists of all of the data, including the grids and metadata, stored in...
Read more >
Data Format – Weights & Biases - Wandb
Structured arrays are a great fit to group this data together in memory and on disk. Short introduction to zarr. We use the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found