question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Broken `abfs://` on version 2022.7.0

See original GitHub issue

I’m running the following code to access an Azure blob storage container:

import adlfs
fs = adlfs.AzureBlobFileSystem()
fs.ls("abfs://my-container-name")

This works perfectly with fsspec==2022.5.0 and adlfs==2022.7.0. However with fsspec==2022.7.0 and adlfs==2022.7.0 I get FileNotFoundError arising from azure.core.exceptions.ResourceNotFoundError: The specified container does not exist.. It does work, however, if I run:

fs.ls("az://my-container-name")

Expectation: abfs://... syntax should be supported on Python environments containing fsspec==2022.7.0.

Environment:

  • Platform: Ubuntu Linux
  • Python: 3.9
  • Credentials provided using AZURE_STORAGE_CONNECTION_STRING environment variable.

Working Python environment (from pip freeze):

adal==1.2.7
adlfs==2022.7.0
aiohttp==3.8.1
aiosignal==1.2.0
async-timeout==4.0.2
attrs==21.4.0
azure-core==1.24.2
azure-datalake-store==0.0.52
azure-identity==1.10.0
azure-storage-blob==12.13.0
certifi @ file:///opt/conda/conda-bld/certifi_1655968806487/work/certifi
cffi==1.15.1
charset-normalizer==2.1.0
cryptography==37.0.4
frozenlist==1.3.0
fsspec==2022.5.0
idna==3.3
isodate==0.6.1
msal==1.18.0
msal-extensions==1.0.0
msrest==0.7.1
multidict==6.0.2
oauthlib==3.2.0
portalocker==2.5.1
pycparser==2.21
PyJWT==2.4.0
python-dateutil==2.8.2
requests==2.28.1
requests-oauthlib==1.3.1
six==1.16.0
treelite==2.0.0
treelite-runtime==2.0.0
typing_extensions==4.3.0
urllib3==1.26.11
yarl==1.7.2

Broken Python environment:

adal==1.2.7
adlfs==2022.7.0
aiohttp==3.8.1
aiosignal==1.2.0
async-timeout==4.0.2
attrs==21.4.0
azure-core==1.24.2
azure-datalake-store==0.0.52
azure-identity==1.10.0
azure-storage-blob==12.13.0
certifi @ file:///opt/conda/conda-bld/certifi_1655968806487/work/certifi
cffi==1.15.1
charset-normalizer==2.1.0
cryptography==37.0.4
frozenlist==1.3.0
fsspec==2022.7.0
idna==3.3
isodate==0.6.1
msal==1.18.0
msal-extensions==1.0.0
msrest==0.7.1
multidict==6.0.2
oauthlib==3.2.0
portalocker==2.5.1
pycparser==2.21
PyJWT==2.4.0
python-dateutil==2.8.2
requests==2.28.1
requests-oauthlib==1.3.1
six==1.16.0
treelite==2.0.0
treelite-runtime==2.0.0
typing_extensions==4.3.0
urllib3==1.26.11
yarl==1.7.2

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
pedro-ricardocommented, Sep 29, 2022

Hello @martindurant, Could the cause of this weird behavior be that fsspec/adlfs implements it’s own _strip_protocol method in AzureBlobFileSystem class? Link to code

I suspect that AzureBlobFileSystem counts on receiving ops = infer_storage_options(path) as not having the "host" on joined on "path" key, as they do : ops["path"] = ops["host"] + ops["path"] some lines later.

But once adding those "adl", "abfs", "abfss" to the protocol list in infer_storage_options, you already do this join internaly here.

if protocol in ("s3", "s3a", "gcs", "gs", "adl", "abfs", "abfss", "gdrive"):
    options["path"] = options["host"] + options["path"]
1reaction
AntonyMilneQBcommented, Aug 2, 2022

@martindurant @toby-coleman @hayesgb Please would it be possible to explain a bit more what has happened here? I don’t know much about the inner workings of fsspec or adlfs, but I think the problem here seems to be not that #988 was wrong but that it was not release in coordination with required changes on the adlfs side.

Context: @SajidAlamQB and I work on Kedro, which relies heavily on fsspec for handling datasets. Two years ago fsspec’s handling of abfs was raised as a possible bug (https://github.com/fsspec/filesystem_spec/issues/256; https://github.com/fsspec/adlfs/issues/45). From reading those (see https://github.com/fsspec/adlfs/issues/45#issuecomment-608689378), it seems that adding absf and adl was indeed the correct thing to do, but needed to be done in coordination with a change to adlfs.

It seems that instead of this change happening, we on Kedro instead rolled our own version of fsspec.utils.infer_storage_options which is the same as fsspec’s version but includes absf and adl in the list of CLOUD_PROTOCOLS. Since then we have received requests from users to extend this list further (abfss and gdrive). These changes have all worked well for our users (I’m not sure why given apparently it should have required a change to adlfs also?), but as per https://github.com/kedro-org/kedro/issues/1632 it is a bit annoying to maintain the list of CLOUD_PROTOCOLS on our side.

Hence, if at all possible, we’d really like to go back to using fsspec’s infer_storage_options. Is there any way of coordinating changes with the other libraries so that we can make the changes that we’d like to CLOUD_PROTOCOLS? I understand this might not be easy to achieve, but if it is possible then that would be much appreciated! 🙏 Otherwise we will need to continue maintaining our own version of infer_storage_options.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Potential bug in fsspec.utils.infer_storage_options · Issue #45 - GitHub
If you were to just add them to the list of protocols, that would break. ... Broken abfs:// on version 2022.7.0 fsspec/filesystem_spec#1002.
Read more >
The AI Search Engine You Control
You.com is an ad-free, private search engine that you control. Customize search results with 150 apps alongside web results. Access a zero-trace private ......
Read more >
fsspec Documentation - Read the Docs
Starting in version 0.7.5, we provide async operations for some methods of some implementations. Async support in storage implementations is ...
Read more >
ServerWebInputException: 400 BAD_REQUEST "Failed to read ...
Broken `abfs://` on version 2022.7.0, 6, 2022-07-28, 2022-12-03. Is it required to call gladLoaderLoadVulkan one more time with VK_EXT_debug_utils?
Read more >
filesystem_spec - bytemeta
Regression in 2022.7.0 for local file handling with no bytes copy. toby-coleman. toby-coleman CLOSED ... Broken `abfs://` on version 2022.7.0. timsnyder.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found