question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

flatten json to multiple rows

See original GitHub issue

Description

if i have an object like:

{
  host: "myhost"
  metrics: [
    { name: "metric1", value: 100},
    { name: "metric2", value: 200}
  ]
}

flatten it to 2 rows like:

host: "myhost", metric: "metric1", value: 100
host: "myhost", metric: "metric2", value: 200

Motivation

benefits of this approach:

  1. Ability to transfer multiple rows of data in a single atomic unit

currently in order to achieve this, you need to use kafka transactional topics, which add even more overhead. with the mentioned approach, a single piece of json if transferred successfully results in ‘all or nothing’ for multiple rows.

  1. Less data transfer

If a piece of data like this is currently received, it needs to be converted to multiple rows, transferred to kafka, then read in multiple separate rows by druid. this results in memory/cpu overhead processsing all the redundant columns of data across those multiple rows. With the approach defined here, they are just defined once, resulting in an overall smaller payload.

  1. decreased complexity

currently you need to have code in a transformer to translate the json above into multiple rows. all that code will no longer be required.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:5
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
himanshugcommented, Aug 30, 2019

update it each time you update or add data sources

that depends on how generic the implementation of InputRowParser is

but anyways, I agree it might not be desirable to write a custom extension depending on your situation but just wanted to let you (and others who watch this ticket) know what options exist as of now.

0reactions
itsikhencommented, Jul 24, 2022

+1. Allowing hosts to report (mini-)tables would be awesome for my use-case of reporting periodic per-process/service stats, where many dimension values are static (basically host attributes, for slice-n-dice):

{"service": "0", ... many dims with similar values ..., <metrics>}
{"service": "1", ... many dims with similar values ..., <metrics>}
..
{"service": "49", ... many dims with similar values ..., <metrics>}

A compact repr could be something like:

{
  "dim1": "dim1_common_val",
  ..
  "dim20": "dim20_common_val",
  [
    {"service": "0", <metrics>},
    ..
    {"service": "49", <metrics>}
  ]
}

Having the support for the compact repr would avoid the complexity of rolling-out compression along the pipeline. Any further recommendations to tackle the above?

Thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Flatten deeply nested JSON into multiple rows - Stack Overflow
First of all, I must confess, I'm the author of the library I'm going to use for this -- convtools (github). Presuming your...
Read more >
Snowflake Flatten 101: How to Flatten JSON & Use Lateral ...
Ensure Unique Records: Hevo Data helps you ensure that only unique records are present in the tables if Primary Keys are defined. Multiple...
Read more >
How to Flatten an Array JSON Structure in ADF - Microsoft Learn
Hi All,. I'm trying to flatten the following JSON structure in Azure Data Factory so that I can get the data from 'rows...
Read more >
Snowflake: FLATTEN JSON with OUTER switch - Medium
So when I try to query the JSON, Output does not flatten the “Third Employee”. We get two rows in output and filter...
Read more >
All Pandas json_normalize() you should know for flattening ...
In this article, you'll learn how to use Pandas's built-in function json_normalize() to flatten those 2 types of JSON into Pandas DataFrames.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found