question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Consider supporting pathlib.Path within Dask Arrays `.to_zarr(...)` method

See original GitHub issue

Not sure if this should be raised in maybe the zarr repo, and I’ve looked through previous issues and couldn’t find anything regarding this - it’s feels like the Dask Array .to_zarr(...) method could support pathlib.Path (directly, i.e. without having to turn it into a str):

Dask Version: 2021.7.2

>>> import pathlib, dask.array as da,  numpy as np
>>> da.from_array(np.array([1])).to_zarr("file.zarr")  # works
>>> da.from_array(np.array([1])).to_zarr(pathlib.Path("file.zarr"))  # yields:

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File ".../lib/python3.8/site-packages/dask/array/core.py", line 2649, in to_zarr
    return to_zarr(self, *args, **kwargs)
  File ".../lib/python3.8/site-packages/dask/array/core.py", line 3381, in to_zarr
    z = zarr.create(
  File ".../lib/python3.8/site-packages/zarr/creation.py", line 136, in create
    init_array(store, shape=shape, chunks=chunks, dtype=dtype, compressor=compressor,
  File ".../lib/python3.8/site-packages/zarr/storage.py", line 352, in init_array
    _init_array_metadata(store, shape=shape, chunks=chunks, dtype=dtype,
  File ".../lib/python3.8/site-packages/zarr/storage.py", line 382, in _init_array_metadata
    elif contains_array(store, path):
  File ".../lib/python3.8/site-packages/zarr/storage.py", line 96, in contains_array
    return key in store
TypeError: argument of type 'PosixPath' is not iterable

In particular I’ve noticed that Dask DataFrame .to_parquet(...) supports this:

>>> import pathlib, dask.dataframe as dd, pandas as pd
>>> dd.from_pandas(pd.DataFrame({"a": [1]}), npartitions=1).to_parquet(pathlib.Path("file.parquet"))  # works

Many thanks for these awesome libs!

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:9 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
clbarnescommented, Aug 17, 2021

It looks like zarr itself doesn’t have full support for pathlib.Path objects yet, but that it is currently in the works (xref zarr-developers/zarr-python#768)

That PR is now merged; zarr will accept pathlib.Path in its convenience methods and store constructors (although it does so by normalising to str) from the next release.

1reaction
jrbourbeaucommented, Sep 21, 2021

A new issue, or a PR, for from_zarr would be very welcome : )

Read more comments on GitHub >

github_iconTop Results From Across the Web

pathlib — Object-oriented filesystem paths — Python 3.11.1 ...
Source code: Lib/pathlib.py This module offers classes representing filesystem paths with semantics appropriate for different operating systems.
Read more >
Using Python's Pathlib Module - Practical Business Python
Pathlib is an object oriented interface to the filesystem and provides a more intuitive method to interact with the filesystem in a platform ......
Read more >
Working with Files - Python Like You Mean It
This section will discuss the best practices for writing Python code that involves reading from and writing to files. We will learn about...
Read more >
Copy file with pathlib in Python - Stack Overflow
To use shutil.copy : import pathlib import shutil my_file = pathlib.Path('/etc/hosts') to_file = pathlib.Path('/tmp/foo') shutil.copy(str(my_file), ...
Read more >
Pathlib module in Python - GeeksforGeeks
As stated above, Pure paths provide purely computational operations. Objects of pure path classes provide various methods for path handling ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found