question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add a feature to override file extension defining compression

See original GitHub issue

Problem description

Transparent compression/decompression is a killer feature that could be made more broadly applicable, if decoupled from the extension of the underlying “file”. Tying application-handling behavior to “file” extension is not a universally portable idea (for example, on classic UNIX, a “shebang” line encodes the same information), and some decoupling already exists in the current ignore_ext flag to smart_open.open. This feature would complete the decoupling by allowing the extension to be effectively overridden to force a compression-handling behavior, by supplying a override_ext string to smart_open.open. At that point, users would be free to name “files” without extensions, and still have the ability to take advantage of the great feature.

Steps/code to reproduce the problem

N/A (not a defect), but any file path without an extension is not eligible for transparent compression/decompression, currently.

Versions

>>> import platform, sys, smart_open
>>> print(platform.platform())
Darwin-19.6.0-x86_64-i386-64bit
>>> print("Python", sys.version)
Python 3.7.8 (default, Jul 27 2020, 17:21:35) 
[Clang 11.0.3 (clang-1103.0.32.59)]
>>> print("smart_open", smart_open.__version__)
smart_open 5.0.0.dev0

Checklist

Before you create the issue, please make sure you have:

  • Described the problem clearly
  • Provided a minimal reproducible example, including any required data
  • Provided the version numbers of the relevant software

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
aperiodiccommented, Apr 20, 2021
  1. Replace ignore_ext parameter with something that’s more flexible and easier to explain to people. Of course, this would be a backwards-incompatible change, so we’d have to make a new major release for it.

I agree that this is the best option, but why does it have to be backwards-incompatible? After the compression parameter is introduced, couldn’t the implementation translate ignore_ext = False to compression = from_extension and ignore_ext = True to compression = none while emitting a deprecation warning? This would allow the new parameter to be introduced and ignore_ext to be deprecated while maintaining backwards-compatibility, and without the explainability and interaction downsides of the first approach.

(If both ignore_ext and compression are passed, then I think raising an exception is the best behavior, but this doesn’t break backwards-compatibility.)

1reaction
mpenkovcommented, May 7, 2021

Probably within the next month or so.

Read more comments on GitHub >

github_iconTop Results From Across the Web

CREATE FILE FORMAT - Snowflake Documentation
When loading data, indicates that the files have not been compressed. When unloading data, specifies that the unloaded files are not compressed. Default....
Read more >
CREATE EXTERNAL FILE FORMAT (Transact-SQL)
This example creates an external file format for an ORC file that compresses the data with the org.apache.io.compress.SnappyCodec data ...
Read more >
Compress file geodatabase data—Help | ArcGIS Desktop
To compress a feature dataset, or stand-alone feature class or table, right-click on it in the Catalog tree and click Manage > Compress...
Read more >
File types - Helix Core Command-Line (P4) Reference (2022.2)
If a match is found, the file's type is set as defined in the typemap table. ... type mapping feature ( p4 typemap...
Read more >
File formats in Adobe Photoshop
To preserve all Photoshop features (layers, effects, masks, and so on), save a copy of your image in Photoshop format (PSD). Like most...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found