question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Entrypoint support for abfss prevent this protocol to be overwritten by another package

See original GitHub issue

Hi!

First of all, I’m not sure if this issue really fits here or if it should be opened in the fsspec main repository as I’m not sure of the best way to solve this (if it should actually be solved).

Context

I’m building a python package that is defining two protocols via the fsspec.specs entrypoint: adl and abfss. This package itself depends on adlfs and simply consists in supercharging the AzureDatalakeFileSystem and AzureBlobFileSystem with default values for the tenant_id, client_id, client_secret, account_name and store_name to fit in our organization.

Problem

When I’m installing my package in a new python environment, the adl protocol defined by default in fsspec is correctly overriden. However, the abfss protocol (defined both in my package’s entrypoints and in adlfs entrypoints) points to the adlfs’ AzureBlobFileSystem instead.

Expected behaviour

I would like my definition of the abfss protocol to override the one defined in adlfs.

Analyzing the issue

In the fsspec.specs entrypoints, both the implementations are defined with adlfs implementation coming last.

from importlib.metadata import entry_points
from pprint import pprint
specs = entry_points().get("fsspec.specs", {})
pprint(specs)
(EntryPoint(name='abfss', value='mypackage:MyAzureBlobFileSystem', group='fsspec.specs'),
 EntryPoint(name='adl', value='mypackage:MyAzureDatalakeFileSystem', group='fsspec.specs'),
 EntryPoint(name='abfss', value='adlfs.AzureBlobFileSystem', group='fsspec.specs'))

Having a look at the fsspec process_entries function (link) show that these specs are handled in order.

The mypackage implementation is first registered before being overriden by the adlfs implementation.

Possible solutions

Personally, I would prefer my implementation to be prioritized. Maybe there is a way to do it, but I don’t have enough knowledge regarding the entrypoints mechanics. If you agree and think this is the way to go, here are the possible solutions I can think of:

  • Add abfss in the fsspec’s known implementations
  • Change process_entries function so that it prioritize the implementation that is the highest in the list

Those two solutions are to be handled from fsspec’s side, that’s why I’m unsure of where to report this issue.

Thank you!

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:12 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
martindurantcommented, Aug 25, 2022

Lines like the following could appear in your package’s init module:

import fsspec
fsspec.register_implementation("protocol", MyClass)
0reactions
martindurantcommented, Aug 26, 2022

Unfortunately, in lieu of a config option in fsspec to choose a preferred implementation per protocol, fsspec must pick which implementation best meets users’ expectations for a given protocol. At the moment, for “abffs”, this is provided by adlfs.

So, certainly change anything on your system to force fsspec to pick the right implementation for you. In the longer term, the config thing is probably the right answer; and these config values can be set in a file or in environment variables.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Automating Azure Synapse Analytics and Azure Analysis ...
This step pulls from two different data sources (on-premises & Azure) and allows me to call two activities simultaneously (General – Stored ...
Read more >
fsspec Documentation - Read the Docs
Some methods support a callback= argument, which is the entry point to providing feedback on transfers to the user or any other logging...
Read more >
Avoid over-writing blobs AZURE - Stack Overflow
My Requirement - i want to avoid this overwrite, as different people may upload files having same name to my container. Please help....
Read more >
Releases · kedro-org/kedro - GitHub
A Python framework for creating reproducible, maintainable and modular data science code. - Releases · kedro-org/kedro.
Read more >
kedro Changelog - pyup.io
Drop support for Python 3.6. Bug fixes and other changes - Overwrite material UI selected row defaults. (568) - Fix URI param parsing...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found