Add stdlib-compatible shim layer.
See original GitHub issue@martindurant This is a trial-balloon issue for this idea, mostly to see if fsspec is the “right” place for this feature and/or if I’ve missed an existing implementation. If so, I’ve a working proof-of-concept built off an older version of dask that could serve as a starting-point PR.
fsspec
is a great module for new development or projects built on the existing dask
ecosystem and enables an amazing S3-is-my-filesystem paradigm. However, the vast majority of projects make use of the python’s built-in file operations, and are tightly coupled to a local filesystem. This causes major development friction when integrating existing third-party libraries into a project, as one almost invariably needs to work out an integration-specific flow between local files and remote storage.
One can workaround this problem with FUSE-based mounts, however this complexifies deployment and containerization. Alternatively, any of several (fsspec, smart_open, pyfilesystem2, et. al.) filesystem abstractions could be used, but updating a third-party component to an alternative filesystem interface is a painful and risky development lift. One either needs to maintain a private fork 😳 or open a massive and risky PR 😬.
I’ve found that a solid majority of filesystem use cases are covered by a relatively small set of operations, all of which is already covered by fsspec
. By providing strictly-compatible shims for a small set of the stdlib (eg: open
, os.path.exists
, os.remove
, glob.glob
, shutil.rmtree
, et. al.) and then swapping these via import
level changes one can quickly teach most libraries to seamlessly interact with all the file systems supported by fsspec
.
This shim layer would mandate strict adherence to standard library semantics for local file operations, likely by directly forwarding all local paths into the standard library and forwarding non-local paths through fsspec
-based implementations. The explicit goal would be to enable a majority of basic use cases, deferring to fsspec
interfaces for more robust integration and/or specialized use cases. This would turn fsspec
into a massively useful layer for updating existing systems to cloud-compatible storage, as updating a library to support s3 and gcsfs would be as simple as:
try:
from fsspec.stdlib import open
import fsspec.stdlib.os.path as os.path
import fsspec.stdlib.shutil as shutil
except ImportError:
import os.path
import shutil
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:11 (8 by maintainers)
Here’s a (mostly) pathlib compatible wrapper that I’m using. I take it there’s not a ton of interest, so I’ll just throw it out there as a gist in case it is of use to anyone.
Yes, please, someone wrap this as a PR.