Compressed resources
See original GitHub issueIdea: describe resource compression. This would allow compression to be “ignored” in describing format and e.g. allow a tabular data package to include compressed CSV not just plain CSV.
Why? In much data management especially with larger dataset compressions is important for economies of storage and transmission. At the moment, data file compression is not explicitly supported by the specification. This also means that profiles like tabular data package or geo data package require that resources are uncompressed.
Proposal
Introduce a compression
property on resource
...
path: mydata.csv.gz
format: "csv",
compression: "gz" # | bz2 | lzo | zip
...
Question: what compression formats do we support?
- Do we support
zip
? It is very common, however quite a bit of tooling e.g. AWS Redshift does not support it. - What about tar + gzip?
Research:
- Standard linux: zip, gzip, bzip2
- Standard Mac install: zip, gzip
- lzop available via brew
- Python: stdlib: zip, gzip, bz2; external: lzop
- Node: stdlib: gzip; external: zip, bz2, lzop
- AWS Redshift supports: gzip, bz2, lzop
- Google BigQuery indicates support for at least gzip
Issue Analytics
- State:
- Created 7 years ago
- Reactions:4
- Comments:21 (19 by maintainers)
Top Results From Across the Web
Resource compression - IBM
Resources are compressed by using an LZ77 algorithm. The is the recommended compression method for resources that are not already compressed. Content Manager ......
Read more >Compressed resources - Kadiska documentation
Compressing resources if a very cost-effective way of improving web performance. As the name suggests, HTTP compression allows content to be compressed on ......
Read more >Compressed and Uncompressed Sources - Win32 apps
Compressed Sources. A source consisting entirely of compressed files should include the compressed flag bit in the Word Count Summary ...
Read more >Identifying Uncompressed Resources in your Web App
In this tip, I'll teach you how to quickly check if all of your resources are properly compressed.
Read more >Data compression - Wikipedia
Computational resources are consumed in the compression and decompression processes. Data compression is subject to a space–time complexity trade-off.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
This would be a great feature, especially in the context of publishing data on a potentially paid service like datahub.io – we have lots of data that zips down by a factor of 10-20x. With compressed resources we could host our entire current collection of US utility data within the 50GB account tier.
@rufuspollock @roll I’ve added a pattern for this. Here’s the PR: https://github.com/frictionlessdata/specs/pull/629