Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Ditch databases?

See original GitHub issue

Currently, we have the following tables in a Terracotta database:

keys: Contains defined key names and their description
datasets: Maps key values to physical raster path
metadata: Maps key values to raster metadata (such as min and max value, bounds, footprint, …)

An alternative model could be to save raster metadata on the rasters themselves. In that case, it would be much less likely for the raster metadata to go out of date. Having the metadata in a database only makes sense if we want to search it, which we currently don’t allow.

Doing this would even allow us to decouple the database from Terracotta entirely. We could then have an external database that the frontend can query for valid rasters, and request them from Terracotta by filename. This gives users flexibility to have a searchable catalogue outside of TC, and we could recover the current behavior by running directory listings on the raster folder.

Issue Analytics

State:
Created 3 years ago
Reactions:1
Comments:16

Top GitHub Comments

1reaction

dionhaefnercommented, Apr 1, 2020

Simplest mode of usage:

(only recommended for data exploration)

$ terracotta serve -r myrasters/{date}/{tile}/{band}.tif

In this case

/keys just gives [date, tile, band] with empty descriptions
/datasets runs glob with the given pattern and extracts / filters keys
/metadata opens GTiff and reads tags, if there are no tags metadata is computed sloppily based on an overview
Metadata access triggers warning about missing tags and possibly missing cloud-optimization

Slightly more advanced:

(recommended usage)

# This optimizes the files and dumps them into the S3 bucket when done
$ terracotta prepare-rasters myrasters/**/*.tif -o s3://myrasters
$ export TC_KEY_DESC="s3://key_desc.json"
$ terracotta serve -r s3://myrasters/{date}/{tile}/{band}.tif

/keys reads descriptions from given JSON file
/datasets retrieves files list of entire bucket and filters it according to query parameters
/metadata reads from GTiff tags

Advanced:

$ terracotta prepare-rasters myrasters/**/*.tif -o s3://myrasters
$ terracotta ingest s3://myrasters/{date}/{tile}/{band}.tif -o s3://myrasters/tc.sqlite
$ export TC_KEY_DESC="s3://key_desc.json"
$ terracotta serve -d s3://myrasters/tc.sqlite

/datasets now runs efficient queries on the SQLite database

Custom paths to rasters:

(keys are not coupled to file paths)

Same as before, but use Python API to create SQLite database

External database:

Option 1a

$ terracotta serve -r s3://myrasters/{date}/{tile}/{band}.tif --external

/keys and /datasets are disabled (TC is just used for serving metadata and tiles)
User supplies an external discovery API that maps searchable parameters to Terracotta’s keys (or just one ID key)
Discovery API could be a Terracotta plugin / related project

Option 1b

$ terracotta serve --external

/keys and /datasets are disabled (TC is just used for serving metadata and tiles)
User supplies an external discovery API that maps searchable parameters to file paths

All raster queries require full paths:

$ curl example.com/metadata?path=s3://myrasters/20180101/25XEL/B05.tif

Option 2a

$ terracotta serve -r s3://myrasters/{date}/{tile}/{band}.tif --external mysql://example.com:123456

Enforce some sort of database structure
Ingestion and creation could go through Terracotta
/keys returns searchable fields

/datasets does equality checks on searchable fields and returns keys used for raster retrieval:

$ curl example.com/datasets?sensor=S2
[{date: 20120101, band:B04, tile:25XEL}]  # note: no sensor here!

Option 2b

$ terracotta serve -r s3://myrasters/{date}/{tile}/{band}.tif --external mysql://example.com:123456

Enforce some sort of database structure
Ingestion and creation could go through Terracotta
/keys returns searchable fields

/datasets supports SQL SELECT syntax:

$ curl example.com/datasets?where="sensor == 'S2' AND tile == '25XEL' AND cloudcover<=95"
[{date: 20120101, band:B04, tile:25XEL}]

1reaction

dionhaefnercommented, Apr 1, 2020

Well, the data has to live somewhere? S3 was just an example, could be any filesystem.

Top Results From Across the Web

Say YES! to NoSQL: A Guide on When to Ditch Relational Databases

Does this mean there's no structured query language? Do I migrate my existing database to this model? In this talk, I'll answer these...

Say YES! to NoSQL: A Guide on When to Ditch ... - YouTube

If you're like many developers, relational databases are your bread and butter. They work (mostly) and are a logical solution to your ...

Dutch Studies: Article Databases - UC Berkeley - Library Guides

The International Bibliography of Periodical Literature in the Humanities and Social Sciences indexes over 5600 journals worldwide, ...

8. Ditch the Database, Embrace the Search Engine - De Gruyter

8. Ditch the Database, Embrace the Search Engine was published in Non-Consensus Investing on page 130.

Amazon's consumer business ditches Oracle's databases

But still, Amazon was probably eager to ditch Oracle in order to showcase its own databases, such as Amazon DynamoDB, Amazon Aurora, ...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Ditch databases?

Issue Analytics

Top GitHub Comments

Simplest mode of usage:

Slightly more advanced:

Advanced:

Custom paths to rasters:

External database:

Option 1a

Option 1b

Option 2a

Option 2b

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Error: VRT already has an alpha band

CORS for map tiles?