Don't silently fix invalid JSON with duplicate keys
See original GitHub issueChecklist
- I’ve searched for similar feature requests.
What enhancement would you like to see?
When a server returns invalid JSON, the HTTPie formatter should either leave it untouched or fail loudly. As it stands, if the JSON contains duplicate keys, HTTPie will quietly remove the duplicates during formatting.
What problem does it solve?
When debugging broken services, it’s important that I can trust HTTPie to faithfully show me what the server returned, so that I focus on figuring out what’s going wrong with my services and not have to spend that time thinking about HTTPie.
From my point of view it’s helpful for HTTPie to format and color text, I’d rather it didn’t sort keys by default but I know that I can reconfigure that, but it’s a problem if HTTPie is removing anything that isn’t non-significant whitespace. If HTTPie is to be a general purpose tool it is important that it doesn’t mask problems.
Provide any additional information, screenshots, or code examples below:
The below example is my motivation for this feature. This server is returning invalid JSON with duplicate keys in the “dps” object. But I could not tell it was doing this, because HTTPie had silently “corrected” the output during formatting. You can see that if I turn off formatting and only colorize the output, I see the true output.
annettewilson@ANNEWILS02M:~/scratch/undercity$ http GET 'http://opentsdb.skymetrics.prod.skyscanner.local:4242/api/query' m=='zimsum:prod.undercity.search_cycle.meter.count_delta{}{dataCenter=literal_or(EU_WEST_1),endpoint=literal_or(search),from_bot=literal_or(false)}' start==1630064720 end==1630064727
HTTP/1.1 200 OK
Connection: keep-alive
Content-Length: 232
Content-Type: application/json;charset=utf-8
Date: Wed, 01 Sep 2021 09:07:45 GMT
Vary: Accept-Encoding
X-OpenTSDB-Query-Complexity: 318
[
{
"aggregateTags": [
"ip",
"traffic_source"
],
"dps": {
"1630064726": 6.0
},
"metric": "prod.undercity.search_cycle.meter.count_delta",
"tags": {
"dataCenter": "EU_WEST_1",
"endpoint": "search",
"from_bot": "false"
}
}
]
annettewilson@ANNEWILS02M:~/scratch/undercity$ http GET 'http://opentsdb.skymetrics.prod.skyscanner.local:4242/api/query' m=='zimsum:prod.undercity.search_cycle.meter.count_delta{}{dataCenter=literal_or(EU_WEST_1),endpoint=literal_or(search),from_bot=literal_or(false)}' start==1630064720 end==1630064727 --pretty colors
HTTP/1.1 200 OK
Content-Type: application/json;charset=utf-8
Date: Wed, 01 Sep 2021 09:07:59 GMT
Vary: Accept-Encoding
X-OpenTSDB-Query-Complexity: 318
Content-Length: 232
Connection: keep-alive
[{"metric":"prod.undercity.search_cycle.meter.count_delta","tags":{"endpoint":"search","dataCenter":"EU_WEST_1","from_bot":"false"},"aggregateTags":["ip","traffic_source"],"dps":{"1630064726":5.0,"1630064726":3.0,"1630064726":6.0}}]
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (5 by maintainers)
Top GitHub Comments
Interesting that JSON doesn’t prohibit duplicate keys. But it makes sense, though. It’s valid on the language syntax level, and semantically it could be explained as following the JS object notation behavior where the latest occurrence overwrites any earlier ones.
In any case, we shouldn’t alter the data for display.
object_pairs_hook=multidict.MultiDict
might be a good start. Cc @BoboTiGThanks for the thorough description. I agree that the current behavior is not acceptable for the formatted output use case. It’s the default behavior of the underlying library we use.
We should customize the JSON loader/serializer we use for formatting to allow repeated keys. Ideally, we’d also extend the JSON lexer to mark repeated keys as errors, but that might be too big of a project for such a rare scenario.