question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

json_normalize should supply empty columns if record_path are not present

See original GitHub issue

Code Sample, a copy-pastable example if possible

df = pd.read_json('filename')
json_normalize(df, record_path = ['data'], \
                         meta = ['id', 'desc'], errors = 'ignore')

Problem description

suppose json file, in some line, doesn’t have anything to be normalized on data but does have id and desc information, no-empty. The function should be able to normalize that to empty columns while keep id and desc in the final result.

I am not sure whether this mean “ignore” but I don’t think excluding the lines like that from the results is a good choice. I would recommend to supply with empty columns is not all columns are missing, both for record_path and meta.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:9 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
averhagencommented, Apr 29, 2021

@WillAyd “I don’t think this is desired behavior”

I disagree, it is often the case that this function is being used on a JSON http response, and it is often the case that http responses are dynamic and exclude data.

A simple way to please everyone would be adding a parameter to the function: “fail_on_missing_record” which defaults to True. If set to False the normalizing function could create rows that only contain meta data for JSON objects where the record_path is missing.

1reaction
wujiayikellycommented, Jul 9, 2018

df = [{'desc': 'Read', 'id': '000e', 'data': [{"index": 1, "type": "PM"}]}, {'desc': 'CPU', 'id': 'e030'} ] output1 = json_normalize(df, record_path = ['data'], meta = ['id', 'desc']) output2 = json_normalize(df)

| index | type | id | desc – | – | – | – | – 0 | 1 | PM | 000e | Read 1 | nan | nan | e030 | CPU

Although ‘data’ doesn’t exist for the second record but it does exist in the first record. Obviously, they come from same source. The key is missing for the second one.

Hopefully, this time it is clearer.

@WillAyd

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to json_normalize a column in pandas with empty lists ...
Show activity on this post. I am using pd. json_normalize to flatten the "sections" field in this data into rows. It works fine...
Read more >
When pandas.json_normalize Doesn't Work - Medium
Upon closer inspection, I noticed the last column stored the missing keys as a list instead of separate columns — and the rummaging...
Read more >
pandas.json_normalize — pandas 1.5.2 documentation
Normalize semi-structured JSON data into a flat table. Unserialized JSON objects. Path in each object to list of records.
Read more >
All Pandas json_normalize() you should know for flattening ...
In this article, you'll learn how to use Pandas's built-in function json_normalize() to flatten those 2 types of JSON into Pandas DataFrames ...
Read more >
Flatten deeply nested json python - Viterbo Artemusica
There is a single column with nested json objects that I find hard the flatten. In that recursive function, if we find the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found