question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Allow for union of query data sources

See original GitHub issue

Description

There are two concepts of union in Druid. There is UNION ALL in SQL, which concatenates the results of two or more SQL queries. This exists only within Druid SQL. There is also the notion of a union data source within a native Druid query. In this instance, Druid will run a query over the raw data of the two or more data sources as if they are one. For this feature request, let us consider this second use case of union.

It is also possible to use a query as a data source. This allows the results of one query to be used as the data source for another query. Let us refer to this kind of data source as a query data source. This is used for nested groupBys and is only currently supported for groupBys.

This feature request is for the ability to union two or more query data sources. This is effectively combining items 2 and 3 on this page: Datasources.

As a suggestion for implementation, a query could be of the form:

{
  "queryType": "groupBy",
  "dataSource":
     {
       "type": "union",
       "dataSources": [
           {
               "type": "query",
                "query": {
                     "type": "groupBy",
                      ...
                 }
            },
            {
               "type": "query",
                "query": {
                     "type": "groupBy",
                      ...
                 }
            },
       ]
    },
  "granularity": "day",
  "dimensions": ["country", "device"],
  "limitSpec": { "type": "default", "limit": 5000, "columns": ["country", "data_transfer"] },
  "filter": {
    "type": "and",
    "fields": [
      { "type": "selector", "dimension": "carrier", "value": "AT&T" },
      { "type": "or", 
        "fields": [
          { "type": "selector", "dimension": "make", "value": "Apple" },
          { "type": "selector", "dimension": "make", "value": "Samsung" }
        ]
      }
    ]
  },
  "aggregations": [
    { "type": "longSum", "name": "total_usage", "fieldName": "user_count" },
    { "type": "doubleSum", "name": "data_transfer", "fieldName": "data_transfer" }
  ],
  "postAggregations": [
    { "type": "arithmetic",
      "name": "avg_usage",
      "fn": "/",
      "fields": [
        { "type": "fieldAccess", "fieldName": "data_transfer" },
        { "type": "fieldAccess", "fieldName": "total_usage" }
      ]
    }
  ],
  "intervals": [ "2012-01-01T00:00:00.000/2012-01-03T00:00:00.000" ],
  "having": {
    "type": "greaterThan",
    "aggregation": "total_usage",
    "value": 100
  }
}

Motivation

Currently it is possible to union two or more data sources and currently it is possible to use a query as a data source. However, it is not possible to union two or more query data sources. Thus, this is a logical next step.

Specifically, this allows for aggregations and post aggregations on two distinct result sets that share the same features.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:11
  • Comments:5

github_iconTop GitHub Comments

6reactions
beljuncommented, Sep 4, 2019

This would be a very useful feature in certain use cases. +1

0reactions
leonsimplecommented, Mar 18, 2021

Can it be achieved by other native queries?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Use a union query to combine multiple queries into a single ...
Create a new query called Query3 with no data source initially and then click the Union command on the Design tab to make...
Read more >
Allow for union of query data sources #8122 - apache/druid
Currently it is possible to union two or more data sources and currently it is possible to use a query as a data...
Read more >
Union Your Data - Tableau Help
You can union your data to combine two or more tables by appending values (rows) from one table to another. To union your...
Read more >
How to UNION Queries in Google BigQuery | Tutorial by Chartio
Use the UNION command to combine the results of multiple queries into a single dataset ... For example, assuming all data sources contain...
Read more >
SQL UNION overview, usage and examples - SQLShack
The Union operator combines the results of two or more queries into a distinct single result set that includes all the rows that...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found