flatten json to multiple rows
See original GitHub issueDescription
if i have an object like:
{
host: "myhost"
metrics: [
{ name: "metric1", value: 100},
{ name: "metric2", value: 200}
]
}
flatten it to 2 rows like:
host: "myhost", metric: "metric1", value: 100
host: "myhost", metric: "metric2", value: 200
Motivation
benefits of this approach:
- Ability to transfer multiple rows of data in a single atomic unit
currently in order to achieve this, you need to use kafka transactional topics, which add even more overhead. with the mentioned approach, a single piece of json if transferred successfully results in ‘all or nothing’ for multiple rows.
- Less data transfer
If a piece of data like this is currently received, it needs to be converted to multiple rows, transferred to kafka, then read in multiple separate rows by druid. this results in memory/cpu overhead processsing all the redundant columns of data across those multiple rows. With the approach defined here, they are just defined once, resulting in an overall smaller payload.
- decreased complexity
currently you need to have code in a transformer to translate the json above into multiple rows. all that code will no longer be required.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:5
- Comments:8 (3 by maintainers)
that depends on how generic the implementation of
InputRowParser
isbut anyways, I agree it might not be desirable to write a custom extension depending on your situation but just wanted to let you (and others who watch this ticket) know what options exist as of now.
+1. Allowing hosts to report (mini-)tables would be awesome for my use-case of reporting periodic per-process/service stats, where many dimension values are static (basically host attributes, for slice-n-dice):
A compact repr could be something like:
Having the support for the compact repr would avoid the complexity of rolling-out compression along the pipeline. Any further recommendations to tackle the above?
Thanks!