question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Currently, we have the following tables in a Terracotta database:

  • keys: Contains defined key names and their description
  • datasets: Maps key values to physical raster path
  • metadata: Maps key values to raster metadata (such as min and max value, bounds, footprint, …)

An alternative model could be to save raster metadata on the rasters themselves. In that case, it would be much less likely for the raster metadata to go out of date. Having the metadata in a database only makes sense if we want to search it, which we currently don’t allow.

Doing this would even allow us to decouple the database from Terracotta entirely. We could then have an external database that the frontend can query for valid rasters, and request them from Terracotta by filename. This gives users flexibility to have a searchable catalogue outside of TC, and we could recover the current behavior by running directory listings on the raster folder.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:1
  • Comments:16

github_iconTop GitHub Comments

1reaction
dionhaefnercommented, Apr 1, 2020

Simplest mode of usage:

(only recommended for data exploration)

$ terracotta serve -r myrasters/{date}/{tile}/{band}.tif

In this case

  • /keys just gives [date, tile, band] with empty descriptions
  • /datasets runs glob with the given pattern and extracts / filters keys
  • /metadata opens GTiff and reads tags, if there are no tags metadata is computed sloppily based on an overview
  • Metadata access triggers warning about missing tags and possibly missing cloud-optimization

Slightly more advanced:

(recommended usage)

# This optimizes the files and dumps them into the S3 bucket when done
$ terracotta prepare-rasters myrasters/**/*.tif -o s3://myrasters
$ export TC_KEY_DESC="s3://key_desc.json"
$ terracotta serve -r s3://myrasters/{date}/{tile}/{band}.tif
  • /keys reads descriptions from given JSON file
  • /datasets retrieves files list of entire bucket and filters it according to query parameters
  • /metadata reads from GTiff tags

Advanced:

$ terracotta prepare-rasters myrasters/**/*.tif -o s3://myrasters
$ terracotta ingest s3://myrasters/{date}/{tile}/{band}.tif -o s3://myrasters/tc.sqlite
$ export TC_KEY_DESC="s3://key_desc.json"
$ terracotta serve -d s3://myrasters/tc.sqlite
  • /datasets now runs efficient queries on the SQLite database

Custom paths to rasters:

(keys are not coupled to file paths)

Same as before, but use Python API to create SQLite database

External database:

Option 1a

$ terracotta serve -r s3://myrasters/{date}/{tile}/{band}.tif --external
  • /keys and /datasets are disabled (TC is just used for serving metadata and tiles)
  • User supplies an external discovery API that maps searchable parameters to Terracotta’s keys (or just one ID key)
  • Discovery API could be a Terracotta plugin / related project

Option 1b

$ terracotta serve --external
  • /keys and /datasets are disabled (TC is just used for serving metadata and tiles)

  • User supplies an external discovery API that maps searchable parameters to file paths

  • All raster queries require full paths:

    $ curl example.com/metadata?path=s3://myrasters/20180101/25XEL/B05.tif
    

Option 2a

$ terracotta serve -r s3://myrasters/{date}/{tile}/{band}.tif --external mysql://example.com:123456
  • Enforce some sort of database structure

  • Ingestion and creation could go through Terracotta

  • /keys returns searchable fields

  • /datasets does equality checks on searchable fields and returns keys used for raster retrieval:

    $ curl example.com/datasets?sensor=S2
    [{date: 20120101, band:B04, tile:25XEL}]  # note: no sensor here!
    

Option 2b

$ terracotta serve -r s3://myrasters/{date}/{tile}/{band}.tif --external mysql://example.com:123456
  • Enforce some sort of database structure

  • Ingestion and creation could go through Terracotta

  • /keys returns searchable fields

  • /datasets supports SQL SELECT syntax:

    $ curl example.com/datasets?where="sensor == 'S2' AND tile == '25XEL' AND cloudcover<=95"
    [{date: 20120101, band:B04, tile:25XEL}] 
    
1reaction
dionhaefnercommented, Apr 1, 2020

Well, the data has to live somewhere? S3 was just an example, could be any filesystem.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Say YES! to NoSQL: A Guide on When to Ditch Relational Databases
Does this mean there's no structured query language? Do I migrate my existing database to this model? In this talk, I'll answer these...
Read more >
Say YES! to NoSQL: A Guide on When to Ditch ... - YouTube
If you're like many developers, relational databases are your bread and butter. They work (mostly) and are a logical solution to your ...
Read more >
Dutch Studies: Article Databases - UC Berkeley - Library Guides
The International Bibliography of Periodical Literature in the Humanities and Social Sciences indexes over 5600 journals worldwide, ...
Read more >
8. Ditch the Database, Embrace the Search Engine - De Gruyter
8. Ditch the Database, Embrace the Search Engine was published in Non-Consensus Investing on page 130.
Read more >
Amazon's consumer business ditches Oracle's databases
But still, Amazon was probably eager to ditch Oracle in order to showcase its own databases, such as Amazon DynamoDB, Amazon Aurora, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found