Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

json_normalize should supply empty columns if record_path are not present

See original GitHub issue

Code Sample, a copy-pastable example if possible

df = pd.read_json('filename')
json_normalize(df, record_path = ['data'], \
                         meta = ['id', 'desc'], errors = 'ignore')

Problem description

suppose json file, in some line, doesn’t have anything to be normalized on data but does have id and desc information, no-empty. The function should be able to normalize that to empty columns while keep id and desc in the final result.

I am not sure whether this mean “ignore” but I don’t think excluding the lines like that from the results is a good choice. I would recommend to supply with empty columns is not all columns are missing, both for record_path and meta.

Issue Analytics

State:
Created 5 years ago
Comments:9 (3 by maintainers)

Top GitHub Comments

2reactions

averhagencommented, Apr 29, 2021

@WillAyd “I don’t think this is desired behavior”

I disagree, it is often the case that this function is being used on a JSON http response, and it is often the case that http responses are dynamic and exclude data.

A simple way to please everyone would be adding a parameter to the function: “fail_on_missing_record” which defaults to True. If set to False the normalizing function could create rows that only contain meta data for JSON objects where the record_path is missing.

1reaction

wujiayikellycommented, Jul 9, 2018

df = [{'desc': 'Read', 'id': '000e', 'data': [{"index": 1, "type": "PM"}]}, {'desc': 'CPU', 'id': 'e030'} ] output1 = json_normalize(df, record_path = ['data'], meta = ['id', 'desc']) output2 = json_normalize(df)

| index | type | id | desc – | – | – | – | – 0 | 1 | PM | 000e | Read 1 | nan | nan | e030 | CPU

Although ‘data’ doesn’t exist for the second record but it does exist in the first record. Obviously, they come from same source. The key is missing for the second one.

Hopefully, this time it is clearer.

@WillAyd