Avoiding metadata bloat caused by many long URLs
See original GitHub issueI woke up this morning wondering whether it would be possible to allow a variable to be defined in the referencefile spec. I’m worried about long s3 (or other) urls bloating the metadata.
If we could do something like:
prefix001='first/superlongurl/that/keeps/going/on/and/on/for/ever/to/some/dir'
prefix002='second/superlongurl/that/keeps/going/on/and/on/for/ever/to/some/dir'
"key1": {
["s3://$prefix001/data001.nc", 10000, 100]
}
"key2": {
["s3://$prefix001/data001.nc", 10100, 100]
}
"key3": {
["s3://$prefix002/data001.nc", 10000, 100]
}
"key4": {
["s3://$prefix002/data001.nc", 10100, 100]
}
we could make the bloat a lot smaller
Issue Analytics
- State:
- Created 3 years ago
- Comments:16 (12 by maintainers)
Top Results From Across the Web
Avoiding metadata bloat caused by many long URLs · Issue #13
I woke up this morning wondering whether it would be possible to allow a variable to be defined in the referencefile spec.
Read more >What's bloating my png? - Stack Overflow
I would like to understand what exactly could be bloating the file, ... losslessly crush and remove metadata from PNGs (and for that...
Read more >How to identify and fix indexation bloat issues
Indexation bloat is when a website has pages within a search engine “index” and can cause issues if not monitored and policed properly....
Read more >Fetch Metadata Request Headers - W3C
This document defines a set of Fetch metadata request headers that aim to provide servers with enough information to make a priori decisions ......
Read more >How to Share Links that Anchor to Any Text on a Webpage
For longer excerpts of text, a range is preferred to avoid bloating the URL. Usually, developers will aim to keep the total length...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
To be sure, I have a bias against arrow, in that the installation in python is enormous, and in many zarr/xarray or other array-based workloads would only be doing this one job.
So exactly, zarr could be the target, I suspect it’s much smaller in code than arrow and/or parquet - but less well known, of course. You can also store arrow in zarr https://github.com/zarr-developers/zarr-python/issues/515
What do you mean? Arrow is not a storage format.
In the context of the usage case of this repo, zarr would be a nice option maybe (it does have a JS implementation).