question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Compressed resources

See original GitHub issue

Idea: describe resource compression. This would allow compression to be “ignored” in describing format and e.g. allow a tabular data package to include compressed CSV not just plain CSV.

Why? In much data management especially with larger dataset compressions is important for economies of storage and transmission. At the moment, data file compression is not explicitly supported by the specification. This also means that profiles like tabular data package or geo data package require that resources are uncompressed.

Proposal

Introduce a compression property on resource

...
path: mydata.csv.gz
format: "csv",
compression: "gz" # | bz2 | lzo | zip
...

Question: what compression formats do we support?

  • Do we support zip? It is very common, however quite a bit of tooling e.g. AWS Redshift does not support it.
  • What about tar + gzip?

Research:

  • Standard linux: zip, gzip, bzip2
  • Standard Mac install: zip, gzip
    • lzop available via brew
  • Python: stdlib: zip, gzip, bz2; external: lzop
  • Node: stdlib: gzip; external: zip, bz2, lzop
  • AWS Redshift supports: gzip, bz2, lzop
  • Google BigQuery indicates support for at least gzip

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Reactions:4
  • Comments:21 (19 by maintainers)

github_iconTop GitHub Comments

2reactions
zaneselvanscommented, Sep 5, 2018

This would be a great feature, especially in the context of publishing data on a potentially paid service like datahub.io – we have lots of data that zips down by a factor of 10-20x. With compressed resources we could host our entire current collection of US utility data within the 50GB account tier.

1reaction
michaelamadicommented, May 16, 2019

@rufuspollock @roll I’ve added a pattern for this. Here’s the PR: https://github.com/frictionlessdata/specs/pull/629

Read more comments on GitHub >

github_iconTop Results From Across the Web

Resource compression - IBM
Resources are compressed by using an LZ77 algorithm. The is the recommended compression method for resources that are not already compressed. Content Manager ......
Read more >
Compressed resources - Kadiska documentation
Compressing resources if a very cost-effective way of improving web performance. As the name suggests, HTTP compression allows content to be compressed on ......
Read more >
Compressed and Uncompressed Sources - Win32 apps
Compressed Sources. A source consisting entirely of compressed files should include the compressed flag bit in the Word Count Summary ...
Read more >
Identifying Uncompressed Resources in your Web App
In this tip, I'll teach you how to quickly check if all of your resources are properly compressed.
Read more >
Data compression - Wikipedia
Computational resources are consumed in the compression and decompression processes. Data compression is subject to a space–time complexity trade-off.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found