question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`Dataset.to_zarr` compute=False should allow access to awaitable

See original GitHub issue

What happened?

I have xarray, zarr installed, but not dask, and am trying to call to_zarr in an async routine. I am looking for something I can await. The doc claims that a Dask.Delayed is returned. I understand that if I have a dask client open with asynchronous=True I can await the result.

However, not using Dask. Is there some way to get an awaitable from this object without a dask client?

What did you expect to happen?

I should get something back I can await in my async routine.

Minimal Complete Verifiable Example

import xarray as xr
from asyncio import get_event_loop

ds = xr.Dataset(data_vars = dict(x = ('x', [1, 2])))
deld = ds.to_zarr("bar.zarr", compute=False)
loop.run_until_complete(deld. ...?)

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None python: 3.8.5 (default, Sep 4 2020, 02:22:02) [Clang 10.0.0 ] python-bits: 64 OS: Darwin OS-release: 20.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: (‘en_US’, ‘UTF-8’) libhdf5: 1.10.4 libnetcdf: 4.7.3

xarray: 2022.3.0 pandas: 1.4.1 numpy: 1.22.3 scipy: 1.6.2 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.11.1 cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.10.0 distributed: 2021.10.0 matplotlib: 3.1.3 cartopy: None seaborn: None numbagg: None fsspec: 2022.02.0 cupy: None pint: None sparse: None setuptools: 60.7.1 pip: 22.0.3 conda: 4.11.0 pytest: 7.1.1 IPython: 7.31.1 sphinx: None

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
shaunccommented, Mar 21, 2022

You can return a awaitable explicitly without declaring async. I’m not sure what the best style is – if there is no await then asyncio will issue a warning in debug mode. Perhaps you can mark it … Alternately, you could pass in an event loop to computed=, which would trigger the task being added with create_task() (as well as being returned?)

As for a contribution … I’m already hacking on something but it isn’t pretty. Will need to think more about how to do it the right way.

0reactions
max-sixtycommented, Mar 20, 2022

That makes sense for the fsspec part, but IIUC we’d also have to make the rest of to_zarr async too?

If it could be made to work without a backward-incompatible change , I think we’d be open to the contribution though.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Dataset.to_zarr() method writes array data even when passed ...
Dataset, I pass the "compute=False" flag to tell xarray to write out the metadata but not the array data. When I inspect the...
Read more >
xarray.Dataset.to_zarr
Zarr chunks are determined in the following way: From the chunks attribute in each variable's encoding (can be set via Dataset.chunk ) ...
Read more >
Concurrently write xarray datasets to zarr - Stack Overflow
I have a workflow that takes a list of raster datasets on S3 and generates a dask-array backed xarray dataset. I need to...
Read more >
`Dataset.to_zarr` compute=False should allow access to awaitable ...
What happened? I have xarray, zarr installed, but not dask, and am trying to call to_zarr in an async routine. I am looking...
Read more >
xarray.Dataset.to_zarr
Delayed object that can be computed to write array data later. Metadata is always updated eagerly. consolidated ( bool , optional) – If...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found