Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

flatten json to multiple rows

See original GitHub issue

Description

if i have an object like:

{
  host: "myhost"
  metrics: [
    { name: "metric1", value: 100},
    { name: "metric2", value: 200}
  ]
}

flatten it to 2 rows like:

host: "myhost", metric: "metric1", value: 100
host: "myhost", metric: "metric2", value: 200

Motivation

benefits of this approach:

Ability to transfer multiple rows of data in a single atomic unit

currently in order to achieve this, you need to use kafka transactional topics, which add even more overhead. with the mentioned approach, a single piece of json if transferred successfully results in ‘all or nothing’ for multiple rows.

Less data transfer

If a piece of data like this is currently received, it needs to be converted to multiple rows, transferred to kafka, then read in multiple separate rows by druid. this results in memory/cpu overhead processsing all the redundant columns of data across those multiple rows. With the approach defined here, they are just defined once, resulting in an overall smaller payload.

decreased complexity

currently you need to have code in a transformer to translate the json above into multiple rows. all that code will no longer be required.

Issue Analytics

State:
Created 4 years ago
Reactions:5
Comments:8 (3 by maintainers)

Top GitHub Comments

1reaction

himanshugcommented, Aug 30, 2019

update it each time you update or add data sources

that depends on how generic the implementation of InputRowParser is

but anyways, I agree it might not be desirable to write a custom extension depending on your situation but just wanted to let you (and others who watch this ticket) know what options exist as of now.

0reactions

itsikhencommented, Jul 24, 2022

+1. Allowing hosts to report (mini-)tables would be awesome for my use-case of reporting periodic per-process/service stats, where many dimension values are static (basically host attributes, for slice-n-dice):

{"service": "0", ... many dims with similar values ..., <metrics>}
{"service": "1", ... many dims with similar values ..., <metrics>}
..
{"service": "49", ... many dims with similar values ..., <metrics>}

A compact repr could be something like:

{
  "dim1": "dim1_common_val",
  ..
  "dim20": "dim20_common_val",
  [
    {"service": "0", <metrics>},
    ..
    {"service": "49", <metrics>}
  ]
}

Having the support for the compact repr would avoid the complexity of rolling-out compression along the pipeline. Any further recommendations to tackle the above?

Thanks!