[SIP-28] Proposal for Geocoding
See original GitHub issue[SIP] Proposal for Geocoding
Motivation
A Superset user wants to use address-based data to generate charts like deck.gl Scatterplot
. They then first need to convert their address based data using an external source so it has the required latitude/longitude
columns.
Many other BI tools can convert addresses automatically.
Proposed Change
We want to implement a feature using the GeoPy
package with the Mapbox Geocoding API
that can convert addresses to latitude/longitude
and save those values as additional columns or overwrite certain columns in the same table.
To make the API calls we plan to use the same API-Key that is already used for the background maps (Mapbox API Key
).
The feature will be available under the menu “Sources” > “Geocode Addresses” and will be implemented asynchronously. Only one geocoding can be in progress at once though. There are multiple reasons for this decision:
- Most geocoding API’s limit the amount of requests per second
- Most geocoding API’s limit the amount of requests that can be made over a certain time period
- Depending on the amount of data in the table the process can take a very long time
If the geocoding is in process and the user navigates to the “Geocode Addresses” URL he will see a progressbar and will have the ability to cancel the process. If no process is ongoing the geocoding form will be shown.
The user can decide what happens if anything goes wrong (for ex. call limit reached, connection issues, etc.) or the process is interrupted. He can choose to save the already converted data or discard it.
New or Changed Public Interfaces
- There will be a new form for the Geocoding in React
- There will be a new REST API that geocodes the address based data on a specific table and adds or overwrites columns
- There will be a new REST API that informs the caller if a geocoding is already progress (boolean, and an integer representing the progress (%))
- There will be a new REST API with wich a user can interrupt a geocoding progress
- There will be a new REST API where you can get a list of columns for a selected table
New dependencies
- We do not need a new dependency, because Superset is already using GeoPy
Migration Plan and Compatibility
The documentation will likely need to be added which describes the usage of this new feature once this is merged into master
Rejected Alternatives
- We thought about on-the-fly geocoding and accepting address-data in certain charts like the
deck.gl Scatterplot
but rejected the idea since geocoding itself is an expensive operation and should only be done once on a specific dataset.
Issue Analytics
- State:
- Created 4 years ago
- Comments:14 (8 by maintainers)
Top GitHub Comments
@Sascha-Gschwind adding support for this in the regular viz UI flow is still some ways off, but geocoding functionality was recently added to the backend as a chart data request post processing operation: https://github.com/apache/incubator-superset/pull/9661 . Currently only gehoash decoding/encoding and parsing of geodesic point strings is supported, but we could potentially add geocoding of addresses to the flow once this feature matures.
@rusackas We don’t plan to carry on with the proposal since we all moved on to different companies etc. where we are working. You can definitely close it out.