question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SIP-11] Proposal for deprecating the native Druid NoSQL connector

See original GitHub issue

Motivation

Superset currently supports two engine connectors for querying datasources; SQLAlchemy and the Druid REST API. The later was the initial use case for Superset, i.e., a UI for visualizing Druid datasources.

Since version 0.10.0 Druid has included a built-it SQL server which has a SQLAlchemy binding provided by the pydruid library (courtesy of @betodealmeida and @mistercrunch) and thus the proposed change is to deprecate the REST API interface in favor of having a single interface (SQLAlchemy) to all engines. Note all future engines (there has been mentioned of adding support for Elasticsearch) would require a SQLAlchemy dialect.

There is a non-insignificant amount of overhead in supporting both connectors including:

Code

From a code perspective each connector needs to define similar views and models. The Druid connector alone comprises of around 2,000 lines of code. There is additional frontend logic which needs to construct filters, metrics, etc. for both the Druid REST API and SQLAlchemy. Note there are 74 files (including documentation) which reference Druid in the repo.

Models

In addition to code overhead each connector defines its own models and database tables:

Druid:

  • clusters
  • datasources
  • columns
  • metrics

SQLAlchemy:

  • dbs
  • tables
  • table_columns
  • sql_metric

which complicates logic, i.e., the slices table does not have a SQLAlchemy relationship to a “datasource” table as the datasource type determines the association. This results in denormalized tables with potentially incorrect values, i.e., the slices table contains the datasource_name column for the FAB CRUD views, however this may not accurately reflect the underlying datasource name.

Proposed Change

The proposed change would be to deprecate all the Druid REST logic from the codebase. This significantly simplifies and streamlines a number of facets of Superset by ensuring that all engines connect via a SQLAlchemy dialect.

Currently there is support for syncing/refreshing Druid datasource associated with the REST API connector which I suspect is leveraged by a number of organizations. SIP-7 discussing “refreshing” of Superset datasources.

Note this would be a breaking change for any organizing using a Druid version less than 0.10.0. Also there may be some instances of post-aggregate Druid functions which are not supported in Druid SQL.

New or Changed Public Interfaces

There would be no new or changed public interfaces.

New dependencies

There would be no new dependencies.

Migration Plan and Compatibility

A non-trivial database migration would be required including:

  • All records in the Druid tables listed above would need to be migrated to the SQLAlchemy equivalent table.
  • Existing slices would need to be updated to reference the new SQLAlchemy representation of the Druid datasource.
  • Re-normalize the slices table.
  • Update chart data to remove the obsolete table__ or druid__ prefixes.

Rejected Alternatives

None.

to: @betodealmeida @graceguo-supercat @kristw @michellethomas @mistercrunch @timifasubaa

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:20
  • Comments:17 (10 by maintainers)

github_iconTop GitHub Comments

5reactions
datametricscommented, Oct 12, 2018

I use the REST API of Druid heavily. It makes it totally easy to make discovery of datasources as well as to implement Druid clients from other languages. The rest format is platform interchangeable and there is no need to implement any further sql parser / converter logic. One can just throw some model classes together and serialise that to Json. From my point of view it would be great when superset continues with REST support or at least leaves the opportunity for connector injection.

2reactions
kkalyancommented, Oct 8, 2018

How are Druid dimension extractions/lookup, filtered aggregations and javascript post-aggregators with Druid SQL? Druid users use them heavily. Some of these are not native to SQL, it would be good to support them.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[SIP-11] Proposal for deprecating the native Druid NoSQL ...
Motivation Superset currently supports two engine connectors for querying datasources; SQLAlchemy and the Druid REST API.
Read more >
Introduction to Apache Druid
Apache Druid is a real-time analytics database designed for fast slice-and-dice ... Cloud-native, fault-tolerant architecture that won't lose data.
Read more >
Updating Superset
... Per SIP-11 and SIP-68, the native NoSQL Druid connector is deprecated and has been removed. Druid is still supported through SQLAlchemy via...
Read more >
NoSQL no more: SQL on Druid with Apache Calcite - YouTube
Founded by the authors of the Apache Druid database, Imply provides a cloud- native solution that delivers real-time ingestion, ...
Read more >
Building Lightning Fast Dashboards Using Apache Druid ...
Building Lightning Fast Dashboards Using Apache Druid & Apache Superset. 628 views 1 year ago. Imply. Imply. 1.21K subscribers. Subscribe.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found