question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Invalid gzip data (fflate gzipped the file) and corruption on decompression (Deno)

See original GitHub issue

I am seeing a few issues with Gzip/Gunzip on Deno.

One is, compressing then decompressing modest size text files, give corrupt and much smaller uncompressed version. Another is compressing a modest sized binary file is generating an invalid compressed file format, and when attempting to decompress get an Invalid gzip data exception.

I wrote the following test to demonstrate the problem:

import * as fflate from 'https://cdn.skypack.dev/fflate';

async function zpipe(reader: Deno.Reader, stream: any) {
  let total = 0;
  async function push(p: Uint8Array, isLast?: boolean) {
    console.log('push', p.byteLength);
    debugger;
    await stream.push(p, isLast);
    total += p.byteLength;
  }
  let prevBlock;
  for await (const block of Deno.iter(reader)) {
    if (prevBlock) await push(prevBlock);
    prevBlock = block;
  }
  if (prevBlock) await push(prevBlock, true);
  console.log(`pushed ${total} bytes`);
}

function zip(from: string, to: string, options = {}) {
  let total = 0;
  return new Promise<void>(async (resolve, reject) => {
    const hFrom = await Deno.open(from, { read: true });
    const hTo = await Deno.open(to, { write: true, create: true, truncate: true });
    const zipper: any = new fflate.Gzip({ level: 9 }, async (chunk: Uint8Array, isLast: boolean) => {
      console.log('zip write chunk', chunk.byteLength);
      await hTo.write(chunk);
      total += chunk.byteLength;
      if (isLast) {
        console.log(`zip close dest file, ${total} bytes`);
        hTo.close();
        resolve();
      }
    });
    await zpipe(hFrom, zipper);
    console.log('zip close source file');
    hFrom.close();
  });
}

function unzip(from: string, to: string) {
  let total = 0;
  return new Promise<void>(async (resolve, reject) => {
    const hFrom = await Deno.open(from, { read: true });
    const hTo = await Deno.open(to, { write: true, create: true, truncate: true });
    const unzipper: any = new fflate.Gunzip();
    unzipper.ondata = async (chunk: Uint8Array, isLast: boolean) => {
      console.log('unzip write chunk', chunk.byteLength);
      await hTo.write(chunk);
      total += chunk.length;
      if (isLast) {
        console.log(`unzip close dest file, ${total} bytes`);
        hTo.close();
        resolve();
      }
    };
    await zpipe(hFrom, unzipper);
    console.log('unzip close source file');
    hFrom.close();
  });
}

const fn = Deno.args[0];
await zip(fn, `${fn}.gz`);
await unzip(`${fn}.gz`, `${fn}.unzipped`);

As a test, I downloaded fflate.js and compressed and decompressed that using the code above:

deno run --allow-all gzip.ts fflate.js

The resulting file sizes are:

-a----       20/03/2021     15:43          54748 fflate.js
-a----       21/03/2021     00:57          14322 fflate.js.gz
-a----       21/03/2021     00:57          16384 fflate.js.unzipped

For the binary file test, I generated a 32kb binary random file using dd

dd if=/dev/random of=LARGE_FILE ibs=1k count=32

Then compress it with the above code:

deno run --allow-all gzip.ts LARGE_FILE

This throws an error on the unzip:

error: Uncaught (in promise) invalid gzip data

And file reports a strange size on the compressed file:

LARGE_DATA.gz:        gzip compressed data, last modified: Sun Mar 21 01:01:11 2021, max compression, from Unix, original size modulo 2^32 100822718

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
101arrowzcommented, Mar 21, 2021

I tried this and you are right, it failed to work properly on Deno. However running a similar script on Node works just fine:

// cmp.js
const { Gzip } = require('fflate')
const { createWriteStream, createReadStream } = require('fs');

const gz = new Gzip((dat, final) => {outs.write(dat), final && outs.end()});

const ins = createReadStream(process.argv[2]);
const outs = createWriteStream(process.argv[2] + '.gz');

let lastChk;
ins.on('data', dat => {
  if (lastChk) gz.push(lastChk)
  lastChk = dat;
})
ins.on('end', () => gz.push(lastChk, true));
$ node cmp.js buf
$ cat buf.gz | gunzip > out
$ diff buf out

$

So I did a bit of digging and as it turns out, Deno reuses the buffer each time you read a chunk! That means that prevBlock is exactly the same reference as block, meaning that you skip the first chunk entirely and duplicate the last chunk. So to fix it, you need to either copy the buffer after each iteration (yuck!) or use the new Uint8Array(0) trick (much faster). Hope that helps.

1reaction
101arrowzcommented, Mar 21, 2021

By the way, I noticed you were using fflate untyped, but you can easily add TypeScript support like this:

// @deno-types="https://cdn.skypack.dev/fflate@0.6.8/lib/index.d.ts"
import * as fflate from 'https://cdn.skypack.dev/fflate@0.6.8?min';

// Types are now supported, no need to cast or use `any`
Read more comments on GitHub >

github_iconTop Results From Across the Web

fflate/index.ts at master · 101arrowz/fflate - GitHub
If the `gunzip` command is used to decompress the data, it will output a file. * with this name instead of the name...
Read more >
compress@v0.4.5 - Deno
Let's compress and uncompress a file. ( gzip only supports compressing and decompressing a single file.) stream mode. Useful for reading and writing...
Read more >
zlib decompression invalid distances set - Stack Overflow
To reproduce the problem, I decompressed the .gz file with latest version of ... minigzip -d . ... That says it's not a...
Read more >
bunzip2 - man pages section 1: User Commands
bunzip2 - sorting file compressor, v1.0.8 bzcat - decompresses files to stdout ... This guards against corruption of the compressed data, ...
Read more >
Changelog - Zstd dev - DocsForge
cli : accept decompressing files with *.zstd suffix. cli : provide a condensed summary ... bug: Fix data corruption in niche use cases...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found