question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Missing DefaultDimensionSpec in Druid query when querying with multiple Lookup dimensions.

See original GitHub issue

When we query Maha with multiple lookup dimensions in the select fields we are seeing some of the fields returned as nulls in the results. The underlying druid query issued did not have these lookups listed in the default dimension specs.

To elaborate further The issued maha query is of the following format. Maha Query

{
  "cube": "test_cube",
  "rowsPerPage": 1000,
  "selectFields": [
    {
      "field": "Adunit ID"
    },
   {
      "field": "Adunit Name"  //Lookup based on Adunit ID
    },
    {
      "field": "Adgroup ID"
    },
   {
      "field": "Adgroup Name" //Lookup based on Adgroup ID
    },
    {
      "field": "Adserver Requests"
    }
  ],
  "filterExpressions": [
    {
      "field": "Publisher ID",
      "operator": "=",
      "value": "xxxxxxxxxAAAAAABBBBBBBBBB"
    },
    {
      "field": "Day",
      "operator": "Between",
      "from": "2020-09-17",
      "to": "2020-09-17"
    }
  ]
}

Output

{
  "header": {
    "cube": "test_cube",
    "fields": [
      {
        "fieldName": "Adunit ID",
        "fieldType": "DIM"
      },
      {
        "fieldName": "Adunit Name",
        "fieldType": "DIM"
      },
      {
        "fieldName": "Adgroup ID",
        "fieldType": "DIM"
      },
      {
        "fieldName": "Adgroup Name",
        "fieldType": "DIM"
      },
      {
        "fieldName": "Adserver Requests",
        "fieldType": "FACT"
      }
    ],
    "maxRows": 1000,
    "debug": {}
  },
  "rows": [
    [
      "dddddddddddAdunitI1dddddddddddddd",
      "dddddddddddAdunitI1 Name dddddddddddddd",
      null,   // _Missing adgroup_id_
      "dddddddddddAdgroup1 Name dddddddddddddd",
      0
    ],
    [
      "dddddddddddAdunitId2ddddddddddddd",
      "dddddddddddAdunitId2 Name ddddddddddddd",
      null,   // _Missing adgroup_id_
      "dddddddddddAdgroup2 Name dddddddddddddd",
      1
    ]
  ],
  "curators": {}
}

Druid Query Created by Maha

{
  "queryType": "groupBy",
  "dataSource": {
    "type": "table",
    "name": "test_cube"
  },
  "intervals": {
    "type": "intervals",
    "intervals": [
      "2020-09-17T00:00:00.000Z/2020-09-18T00:00:00.000Z"
    ]
  },
  "virtualColumns": [],
  "filter": {
    "type": "and",
    "fields": [
      {
        "type": "or",
        "fields": [
          {
            "type": "selector",
            "dimension": "__time",
            "value": "2020-09-17",
            "extractionFn": {
              "type": "timeFormat",
              "format": "YYYY-MM-dd",
              "timeZone": "UTC",
              "granularity": {
                "type": "none"
              },
              "asMillis": false
            }
          }
        ]
      },
      {
        "type": "selector",
        "dimension": "pubId",
        "value": "xxxxxxxxxAAAAAABBBBBBBBBB"
      }
    ]
  },
  "granularity": {
    "type": "all"
  },
  "dimensions": [     // No adgroup_id added here. 
    {
      "type": "default",
      "dimension": "adunitId",
      "outputName": "Adunit ID",
      "outputType": "STRING"
    },
    {
      "type": "extraction",
      "dimension": "adgroupId",
      "outputName": "Adgroup Name",
      "outputType": "STRING",
      "extractionFn": {
        "type": "registeredLookup",
        "lookup": "adgroup_names",
        "retainMissingValue": false,
        "replaceMissingValueWith": "UNKNOWN"
      }
    },
    {
      "type": "extraction",
      "dimension": "adunitId",
      "outputName": "Adunit Name",
      "outputType": "STRING",
      "extractionFn": {
        "type": "registeredLookup",
        "lookup": "adunit_names",
        "retainMissingValue": false,
        "replaceMissingValueWith": "UNKNOWN"
      }
    }
  ],
  "aggregations": [
    {
      "type": "longSum",
      "name": "Adserver Requests",
      "fieldName": "adserverRequests"
    }
  ],
  "postAggregations": [],
  "limitSpec": {
    "type": "default",
    "columns": [],
    "limit": 10000000
  },
  "context": {
    "groupByStrategy": "v2",
    "applyLimitPushDown": "false",
    "implyUser": "internal_user",
    "priority": 10,
    "userId": "internal_user",
    "uncoveredIntervalsLimit": 1,
    "groupByIsSingleThreaded": true,
    "timeout": 900000,
    "queryId": "9292389f-2a7f-4e12-a39a-6f727097ab92"
  },
  "descending": false
}

Upon debugging further I stumbled upon the variable factRequestCols(Set of Strings) at https://github.com/yahoo/maha/blob/master/core/src/main/scala/com/yahoo/maha/core/query/druid/DruidQueryGenerator.scala#L353 which is being passed on to method at https://github.com/yahoo/maha/blob/master/core/src/main/scala/com/yahoo/maha/core/query/druid/DruidQueryGenerator.scala#L381 where druid queries dimension specs are being created in getDimensions method based on factRequestCols passed.

I am not quite sure I understand the logic of factRequestCols set creation here but adgroup_id dimension is not getting included in the resulting set because of which it is not getting added to Druid query dimension spec as well.

I locally overrode the code by passing queryContext.factBestCandidate.requestCols to getDimensions method at https://github.com/yahoo/maha/blob/master/core/src/main/scala/com/yahoo/maha/core/query/druid/DruidQueryGenerator.scala#L381 and it fixed the issue and started populating the dimension spec in druid query as well as had adgroup_id values in the resulting out.

Output after the change

{
  "header": {
    "cube": "platform_performance_cube",
    "fields": [
      {
        "fieldName": "Adunit ID",
        "fieldType": "DIM"
      },
      {
        "fieldName": "Adunit Name",
        "fieldType": "DIM"
      },
      {
        "fieldName": "Adgroup ID",
        "fieldType": "DIM"
      },
      {
        "fieldName": "Adgroup Name",
        "fieldType": "DIM"
      },
      {
        "fieldName": "Adserver Requests",
        "fieldType": "FACT"
      }
    ],
    "maxRows": 1000,
    "debug": {}
  },
  "rows": [
    [
      "dddddddddddAdunitI1dddddddddddddd",
      "dddddddddddAdunitI1 Name dddddddddddddd",
      "dddddddddddAdgroup1dddddddddddddd",
      "dddddddddddAdgroup1 Name dddddddddddddd",
      0
    ],
    [
      "dddddddddddAdunitId2ddddddddddddd",
      "dddddddddddAdunitId2 Name ddddddddddddd",
      "dddddddddddAdgroup2dddddddddddddd",
      "dddddddddddAdgroup2 Name dddddddddddddd",
      1
    ]
  ],
  "curators": {}
}

Could you please help me with the context of factRequestCols set creation and also let me know if the logic of the factRequestCols set creation or anything else needs to be changed to include the missing dimensions.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:10

github_iconTop GitHub Comments

1reaction
upendrareddycommented, Sep 24, 2020

Got it. Thank you!

0reactions
patelhcommented, Sep 22, 2020

@upendrareddy Unless you have a dim driven use case (e.g. entity management in the UI, e.g. Campaign Management view), they should all be fact driven so I don’t see any issues with doing it for all queries.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Query dimensions - Apache Druid
Default DimensionSpec. Returns dimension values as is and optionally renames the dimension. · Extraction DimensionSpec · Filtered DimensionSpecs · Lookup ...
Read more >
Issues · yahoo/maha - GitHub
Missing DefaultDimensionSpec in Druid query when querying with multiple Lookup dimensions. ... ProTip! Add no:assignee to see everything that's not assigned.
Read more >
Lookup Query - Google Groups
Hi Navneet, Unfortunately, it doesn't seem like you can currently use extraction dimensions with select-type queries. Select queries expect an array of strings ......
Read more >
druid-query - npm
Installation & Introductory Examples; API; Queries; TODO; License. Installation. npm install druid-query --save. Example (simple).
Read more >
GroupBy queries · 2022.11 - Imply Documentation
These types of Apache Druid queries take a groupBy query object and return ... When grouping on a multi-value dimension, all values from...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found