question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Avoiding metadata bloat caused by many long URLs

See original GitHub issue

I woke up this morning wondering whether it would be possible to allow a variable to be defined in the referencefile spec. I’m worried about long s3 (or other) urls bloating the metadata.

If we could do something like:

prefix001='first/superlongurl/that/keeps/going/on/and/on/for/ever/to/some/dir'
prefix002='second/superlongurl/that/keeps/going/on/and/on/for/ever/to/some/dir'

"key1": {
    ["s3://$prefix001/data001.nc", 10000, 100]
  }
"key2": {
    ["s3://$prefix001/data001.nc", 10100, 100]
  }
"key3": {
    ["s3://$prefix002/data001.nc", 10000, 100]
  }
"key4": {
    ["s3://$prefix002/data001.nc", 10100, 100]
  }

we could make the bloat a lot smaller

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:16 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
martindurantcommented, Dec 17, 2020

To be sure, I have a bias against arrow, in that the installation in python is enormous, and in many zarr/xarray or other array-based workloads would only be doing this one job.

Context: I work on the js implementation of zarr.

So exactly, zarr could be the target, I suspect it’s much smaller in code than arrow and/or parquet - but less well known, of course. You can also store arrow in zarr https://github.com/zarr-developers/zarr-python/issues/515

1reaction
martindurantcommented, Dec 17, 2020

I’m curious if Arrow would be of interest/suitable binary format.

What do you mean? Arrow is not a storage format.

In the context of the usage case of this repo, zarr would be a nice option maybe (it does have a JS implementation).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Avoiding metadata bloat caused by many long URLs · Issue #13
I woke up this morning wondering whether it would be possible to allow a variable to be defined in the referencefile spec.
Read more >
What's bloating my png? - Stack Overflow
I would like to understand what exactly could be bloating the file, ... losslessly crush and remove metadata from PNGs (and for that...
Read more >
How to identify and fix indexation bloat issues
Indexation bloat is when a website has pages within a search engine “index” and can cause issues if not monitored and policed properly....
Read more >
Fetch Metadata Request Headers - W3C
This document defines a set of Fetch metadata request headers that aim to provide servers with enough information to make a priori decisions ......
Read more >
How to Share Links that Anchor to Any Text on a Webpage
For longer excerpts of text, a range is preferred to avoid bloating the URL. Usually, developers will aim to keep the total length...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found