question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Calling AzureBlobFileSystem.cat on a file path, adds a "/" to the end of the file path

See original GitHub issue

I was using dask + zarr to store arrays on azure blob storage using this library, but I ran into an issue.

What happens is that:

  1. At some point when loading a zarr file from the storage fsspec.mapping.FSMap is created (with .fs being an AzureBlobFileSystem instance);
  2. It calls self.fs.cat(k) where k is a string representing a file path on the blob storage: e.g. my-blob/my-array/.zarray; and self.fs is the AzureBlobFileSystem instance;
  3. AzureBlobFileSystem.cat calls AzureBlobFileSystem._expand_path at some point which runs this line:

https://github.com/dask/adlfs/blob/3874b3e536fe6b24c824ee096566c8620b623dfa/adlfs/spec.py#L1351

  1. The thing returns: my-blob/my-array/.zarray/ and later on the loading crashes with the following error:
ResourceNotFoundError: Operation returned an invalid status 'The specified blob does not exist.'
ErrorCode:BlobNotFound

I think the solution is just to make sure we don’t add a “/” in _expand_path

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:9 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
hayesgbcommented, Apr 15, 2021

Fixed with #217

1reaction
TomAugspurgercommented, Apr 14, 2021

Thanks! I do think it’s worth having a longer discussion around the ideal behavior here, about how to consistently handle pseduo-directories in these object stores. Maybe that discussion has happened though, I haven’t followed closely (edit: that’s happening in https://github.com/intake/filesystem_spec/issues/562).

I haven’t tested it yet, but something like this might work

diff --git a/adlfs/spec.py b/adlfs/spec.py
index 298fa7b..b13bcfd 100644
--- a/adlfs/spec.py
+++ b/adlfs/spec.py
@@ -1263,6 +1263,11 @@ class AzureBlobFileSystem(AsyncFileSystem):
 
         async with self.service_client.get_blob_client(container_name, path) as bc:
             exists = await bc.exists()
+
+        if path.endswith("/") and not exists:
+            async with self.service_client.get_blob_client(container_name, path[:-1]) as bc:
+                exists = await bc.exists()
+
         return exists
 
     async def _pipe_file(self, path, value, overwrite=True, **kwargs):

That solves the exists() side. Unfortunately things like .open() will need to be updated as well…

Read more comments on GitHub >

github_iconTop Results From Across the Web

Directories and delimiter handling · Issue #562 - GitHub
In comments, these are usually called "pseudo-directories". ... Calling AzureBlobFileSystem.cat on a file path, adds a "/" to the end of the file...
Read more >
Quickstart: Azure Blob Storage client library for Python
Uploads the local text file to the blob by calling the upload_blob method. Add this code to the end of the try block:...
Read more >
Get last dirname/filename in a file path argument in Bash
I need to get just "example" off the end of the string and then concat it with another string so I can checkout...
Read more >
Get the Last Directory or Filename From a File Path - Baeldung
Learn how to extract the last component from a given path string. ... the Linux command line, we often need to handle file...
Read more >
Hadoop Azure Support: ABFS — Azure Data Lake Storage Gen2
Azure Blob File System Flush Options; 2. ... To retrieve using shell script, specify the path to the script for the config ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found