Consider supporting pathlib.Path within Dask Arrays `.to_zarr(...)` method
See original GitHub issueNot sure if this should be raised in maybe the zarr repo, and I’ve looked through previous issues and couldn’t find anything regarding this - it’s feels like the Dask Array .to_zarr(...) method could support pathlib.Path (directly, i.e. without having to turn it into a str):
Dask Version: 2021.7.2
>>> import pathlib, dask.array as da, numpy as np
>>> da.from_array(np.array([1])).to_zarr("file.zarr") # works
>>> da.from_array(np.array([1])).to_zarr(pathlib.Path("file.zarr")) # yields:
Traceback (most recent call last):
File "<input>", line 1, in <module>
File ".../lib/python3.8/site-packages/dask/array/core.py", line 2649, in to_zarr
return to_zarr(self, *args, **kwargs)
File ".../lib/python3.8/site-packages/dask/array/core.py", line 3381, in to_zarr
z = zarr.create(
File ".../lib/python3.8/site-packages/zarr/creation.py", line 136, in create
init_array(store, shape=shape, chunks=chunks, dtype=dtype, compressor=compressor,
File ".../lib/python3.8/site-packages/zarr/storage.py", line 352, in init_array
_init_array_metadata(store, shape=shape, chunks=chunks, dtype=dtype,
File ".../lib/python3.8/site-packages/zarr/storage.py", line 382, in _init_array_metadata
elif contains_array(store, path):
File ".../lib/python3.8/site-packages/zarr/storage.py", line 96, in contains_array
return key in store
TypeError: argument of type 'PosixPath' is not iterable
In particular I’ve noticed that Dask DataFrame .to_parquet(...) supports this:
>>> import pathlib, dask.dataframe as dd, pandas as pd
>>> dd.from_pandas(pd.DataFrame({"a": [1]}), npartitions=1).to_parquet(pathlib.Path("file.parquet")) # works
Many thanks for these awesome libs!
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (6 by maintainers)
Top Results From Across the Web
pathlib — Object-oriented filesystem paths — Python 3.11.1 ...
Source code: Lib/pathlib.py This module offers classes representing filesystem paths with semantics appropriate for different operating systems.
Read more >Using Python's Pathlib Module - Practical Business Python
Pathlib is an object oriented interface to the filesystem and provides a more intuitive method to interact with the filesystem in a platform ......
Read more >Working with Files - Python Like You Mean It
This section will discuss the best practices for writing Python code that involves reading from and writing to files. We will learn about...
Read more >Copy file with pathlib in Python - Stack Overflow
To use shutil.copy : import pathlib import shutil my_file = pathlib.Path('/etc/hosts') to_file = pathlib.Path('/tmp/foo') shutil.copy(str(my_file), ...
Read more >Pathlib module in Python - GeeksforGeeks
As stated above, Pure paths provide purely computational operations. Objects of pure path classes provide various methods for path handling ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

That PR is now merged; zarr will accept
pathlib.Pathin its convenience methods and store constructors (although it does so by normalising tostr) from the next release.A new issue, or a PR, for
from_zarrwould be very welcome : )