question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

.gz doesn't return correct file size for content above 2^32 B = 4 GB

See original GitHub issue
  • 1.0.0-beta13
  • Linux 64bit

Taking it out from #629, apparently the gzip file format cannot accurately return the size of files above 4GB (2^32 bytes), but returns the modulo.

Looks like on the command line people recommended something like zcat file.gz | wc -c or gzip -dc file | wc -c which give the correct value - though then decompresses the the file twice. Might have to do that for gzip in the end, though, since likely >4GB files are common for Etcher’s use case.

This might let images to start to be burned onto cards that are too small (in worst case), or affects the progress bar.

From testing with a 4100MiB > 4096MiB image, indeed .gz version lets to select a 512MB SD card, while the same file’s .xz archive does not. For the progress bar, the MB/s reading seems to be affected (shows very low speed, eg. 0.01MB/s) but the progress percentage does not (shows correctly for the burning process), so it’s not too bad.

Issue Analytics

  • State:open
  • Created 7 years ago
  • Comments:42 (38 by maintainers)

github_iconTop GitHub Comments

1reaction
tipuraneocommented, Aug 5, 2019

I know. So you could say gzip is not the best choice for files > 4 GB? I prefer xz over gzip.

1reaction
alexandrosmcommented, Mar 4, 2017

Beyond all this though, we should have a heuristic that basically says this:

  • gzip compresses images within a certain range (e.g. 1.5x to 3x)
  • if an image claims to be much out of that range (e.g. says it’s 300mb but the archive is 2.5gb) we should assume it’s actually 4.3gb instead. Essentially, we should add 2^32 bytes to the estimated size again and again, until the compression ratio gets within a realistic range.

I think an algorithm like this, used only for gzip files (maybe bzip too?), should fix the vast majority of the cases. We should still fail well when we’re wrong, but we should try hard to be right 😃

Alexandros Marinos

Founder & CEO, Resin.io

+1 206-637-5498

@alexandrosm

On Fri, Mar 3, 2017 at 3:51 PM, Juan Cruz Viotti notifications@github.com wrote:

Yeah, I don’t know. Maybe it also depends on the image itself? I’m a compression noob, so I have no clue apart from what I saw on my experiments.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/resin-io/etcher/issues/638#issuecomment-284102753, or mute the thread https://github.com/notifications/unsubscribe-auth/ABLUCKLN91vbMP2jKNkGcuVVVW_-u52jks5riKeIgaJpZM4Joo50 .

Read more comments on GitHub >

github_iconTop Results From Across the Web

How portable is a gzip file over 4 GB in size?
More accurately, the gzip format can't correctly store uncompressed file sizes over 4GiB; it stores the lower 32 bits of the uncompressed size, ......
Read more >
How can I get the uncompressed size of gzip file without ...
Compressed file size close to 4 GB. So, I tried this option in order to capture correct data: $ zcat mycontent.DAT.Gz | wc...
Read more >
Check the total content size of a tar gz file - Stack Overflow
I believe gzip -l doesn't work with file size greater than 4GB, since gzip only uses 4 bytes to store the original file...
Read more >
Publication 1346 (Rev. 10-2012) - IRS
Electronic Return File Specifications and ... Section 4 - Types of Records. ... the cursor will be to the right of the colon...
Read more >
RPi Easy SD Card Setup - eLinux.org
Format the entire disk as FAT32 (FAT16 will not work! Make sure you select the correct disk!) Extract the file you downloaded in...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found