json_normalize should supply empty columns if record_path are not present
See original GitHub issueCode Sample, a copy-pastable example if possible
df = pd.read_json('filename')
json_normalize(df, record_path = ['data'], \
meta = ['id', 'desc'], errors = 'ignore')
Problem description
suppose json file, in some line, doesn’t have anything to be normalized on data but does have id and desc information, no-empty. The function should be able to normalize that to empty columns while keep id and desc in the final result.
I am not sure whether this mean “ignore” but I don’t think excluding the lines like that from the results is a good choice. I would recommend to supply with empty columns is not all columns are missing, both for record_path and meta.
Issue Analytics
- State:
- Created 5 years ago
- Comments:9 (3 by maintainers)
Top Results From Across the Web
How to json_normalize a column in pandas with empty lists ...
Show activity on this post. I am using pd. json_normalize to flatten the "sections" field in this data into rows. It works fine...
Read more >When pandas.json_normalize Doesn't Work - Medium
Upon closer inspection, I noticed the last column stored the missing keys as a list instead of separate columns — and the rummaging...
Read more >pandas.json_normalize — pandas 1.5.2 documentation
Normalize semi-structured JSON data into a flat table. Unserialized JSON objects. Path in each object to list of records.
Read more >All Pandas json_normalize() you should know for flattening ...
In this article, you'll learn how to use Pandas's built-in function json_normalize() to flatten those 2 types of JSON into Pandas DataFrames ...
Read more >Flatten deeply nested json python - Viterbo Artemusica
There is a single column with nested json objects that I find hard the flatten. In that recursive function, if we find the...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@WillAyd “I don’t think this is desired behavior”
I disagree, it is often the case that this function is being used on a JSON http response, and it is often the case that http responses are dynamic and exclude data.
A simple way to please everyone would be adding a parameter to the function: “fail_on_missing_record” which defaults to True. If set to False the normalizing function could create rows that only contain meta data for JSON objects where the record_path is missing.
df = [{'desc': 'Read', 'id': '000e', 'data': [{"index": 1, "type": "PM"}]}, {'desc': 'CPU', 'id': 'e030'} ] output1 = json_normalize(df, record_path = ['data'], meta = ['id', 'desc']) output2 = json_normalize(df)
| index | type | id | desc – | – | – | – | – 0 | 1 | PM | 000e | Read 1 | nan | nan | e030 | CPU
Although ‘data’ doesn’t exist for the second record but it does exist in the first record. Obviously, they come from same source. The key is missing for the second one.
Hopefully, this time it is clearer.
@WillAyd