Entrypoint support for abfss prevent this protocol to be overwritten by another package
See original GitHub issueHi!
First of all, I’m not sure if this issue really fits here or if it should be opened in the fsspec main repository as I’m not sure of the best way to solve this (if it should actually be solved).
Context
I’m building a python package that is defining two protocols via the fsspec.specs
entrypoint: adl
and abfss
.
This package itself depends on adlfs and simply consists in supercharging the AzureDatalakeFileSystem
and AzureBlobFileSystem
with default values for the tenant_id, client_id, client_secret, account_name and store_name to fit in our organization.
Problem
When I’m installing my package in a new python environment, the adl
protocol defined by default in fsspec is correctly overriden. However, the abfss
protocol (defined both in my package’s entrypoints and in adlfs entrypoints) points to the adlfs’ AzureBlobFileSystem
instead.
Expected behaviour
I would like my definition of the abfss
protocol to override the one defined in adlfs.
Analyzing the issue
In the fsspec.specs
entrypoints, both the implementations are defined with adlfs implementation coming last.
from importlib.metadata import entry_points
from pprint import pprint
specs = entry_points().get("fsspec.specs", {})
pprint(specs)
(EntryPoint(name='abfss', value='mypackage:MyAzureBlobFileSystem', group='fsspec.specs'),
EntryPoint(name='adl', value='mypackage:MyAzureDatalakeFileSystem', group='fsspec.specs'),
EntryPoint(name='abfss', value='adlfs.AzureBlobFileSystem', group='fsspec.specs'))
Having a look at the fsspec process_entries
function (link) show that these specs are handled in order.
The mypackage
implementation is first registered before being overriden by the adlfs implementation.
Possible solutions
Personally, I would prefer my implementation to be prioritized. Maybe there is a way to do it, but I don’t have enough knowledge regarding the entrypoints mechanics. If you agree and think this is the way to go, here are the possible solutions I can think of:
- Add
abfss
in the fsspec’s known implementations - Change
process_entries
function so that it prioritize the implementation that is the highest in the list
Those two solutions are to be handled from fsspec’s side, that’s why I’m unsure of where to report this issue.
Thank you!
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:12 (11 by maintainers)
Top GitHub Comments
Lines like the following could appear in your package’s init module:
Unfortunately, in lieu of a config option in fsspec to choose a preferred implementation per protocol, fsspec must pick which implementation best meets users’ expectations for a given protocol. At the moment, for “abffs”, this is provided by adlfs.
So, certainly change anything on your system to force fsspec to pick the right implementation for you. In the longer term, the config thing is probably the right answer; and these config values can be set in a file or in environment variables.