question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support reading multi-member gzip files or providing access to remaining data

See original GitHub issue

What can’t you do right now?

Gzip supports having ‘multi-member’ gzip files, where essentially gzip files are concatenated one after the other. This is used in certain formats, such as WARC

An optimal solution

An optimal solution would be for fflate Inflate to provide a way parse mutli-member gzip files by providing an option, and an additional callback when a new member is started (as well as the offset of the member into the stream).

Another option is to provide an offset into the buffer consumed by reading the gzip, allowing the developer to manually create a new Inflate object.

(How) is this done by other libraries?

pako provides a avail_in counter which keeps track of how many bytes have not yet been consumed. One approach I’ve used is something like this: https://github.com/webrecorder/warcio.js/blob/main/src/readers.js#L282 (though this is with an earlier version of pako). Pako in latest version may try to read the multi-member gzips as one buffer, though it seems it doesn’t always work (in my tests)

A key my use case is to be able to get an offset to the beginning of each member, and flush the data buffer at the end of each member.

Ideally, there could be a callback that indicates when a new member has been started and the offset at that new member:

onnewgzipmember: OnNewGzipMemberCallback
OnNewGzipMemberCallback = (offset: number) => void

The ondata callbacks after onnewgzipmember are assumed to be from the gzip member, and ondata always flushes when the member boundary is reached.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:2
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

3reactions
101arrowzcommented, May 13, 2022

I had forgotten about this issue but what you’ve proposed does seem compelling. I’ll look into the bgzip spec and try to implement it for the next release.

1reaction
101arrowzcommented, Nov 16, 2021

I’ll preface by saying that this is a very niche use case, so even if it is easy to implement I’ll have to weigh the bundle size costs to see if it’s worth adding. That being said, this might be possible to add to the streaming API, i.e. fflate.Gunzip. I’ll let you know if it seems feasible when I can.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Failing for BGZIP'd streaming files · Issue #139 · nodeca/pako
The file pako-fail-test-data.txt.gz is an examp... ... Support reading multi-member gzip files or providing access to remaining data ...
Read more >
Solved: Reading Gzip Files - SAS Support Communities
Hi,. I would like to read from a gzip SAS dataset into my program. I use the following, but I am not getting...
Read more >
gzip — Support for gzip files — Python 3.11.1 documentation
The gzip module provides the GzipFile class, as well as the open() , compress() and decompress() convenience functions. The GzipFile class reads and...
Read more >
The lzip format
Provide random access to the data in multimember files. Manage metadata stored as trailing data in lzip files. The development version of ...
Read more >
util/pack/lzip.lha - Aminet
Lzip can compress about as fast as gzip (lzip -0) or compress most files more ... both data integrity and decoder availability: *...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found