Add a feature to override file extension defining compression
See original GitHub issueProblem description
Transparent compression/decompression is a killer feature that could be made more broadly applicable, if decoupled from the extension of the underlying “file”. Tying application-handling behavior to “file” extension is not a universally portable idea (for example, on classic UNIX, a “shebang” line encodes the same information), and some decoupling already exists in the current ignore_ext
flag to smart_open.open
. This feature would complete the decoupling by allowing the extension to be effectively overridden to force a compression-handling behavior, by supplying a override_ext
string to smart_open.open
. At that point, users would be free to name “files” without extensions, and still have the ability to take advantage of the great feature.
Steps/code to reproduce the problem
N/A (not a defect), but any file path without an extension is not eligible for transparent compression/decompression, currently.
Versions
>>> import platform, sys, smart_open
>>> print(platform.platform())
Darwin-19.6.0-x86_64-i386-64bit
>>> print("Python", sys.version)
Python 3.7.8 (default, Jul 27 2020, 17:21:35)
[Clang 11.0.3 (clang-1103.0.32.59)]
>>> print("smart_open", smart_open.__version__)
smart_open 5.0.0.dev0
Checklist
Before you create the issue, please make sure you have:
- Described the problem clearly
- Provided a minimal reproducible example, including any required data
- Provided the version numbers of the relevant software
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (2 by maintainers)
I agree that this is the best option, but why does it have to be backwards-incompatible? After the
compression
parameter is introduced, couldn’t the implementation translateignore_ext = False
tocompression = from_extension
andignore_ext = True
tocompression = none
while emitting a deprecation warning? This would allow the new parameter to be introduced andignore_ext
to be deprecated while maintaining backwards-compatibility, and without the explainability and interaction downsides of the first approach.(If both
ignore_ext
andcompression
are passed, then I think raising an exception is the best behavior, but this doesn’t break backwards-compatibility.)Probably within the next month or so.