question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

PERF: json_normalize

See original GitHub issue

I haven’t looked much at the implementation, but guessing simpler cases like this could be optimized.

In [63]: data = [
    ...:     {'name': 'Name',
    ...:      'value': 1.0,
    ...:      'value2': 2.0,
    ...:      'nested': {'a': 'aa', 'b': 'bb'}}] * 1000000

In [64]: %timeit pd.DataFrame(data)
1 loop, best of 3: 847 ms per loop

In [65]: %timeit pd.io.json.json_normalize(data)
1 loop, best of 3: 20 s per loop

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.19.2 nose: 1.3.7 pip: 8.1.2 setuptools: 23.0.0 Cython: 0.24.1 numpy: 1.11.2 scipy: 0.18.1 statsmodels: 0.6.1 xarray: 0.8.2 IPython: 5.1.0 sphinx: 1.3.1 patsy: 0.4.1 dateutil: 2.5.3 pytz: 2016.4 blosc: None bottleneck: 1.1.0 tables: 3.2.2 numexpr: 2.6.1 matplotlib: 1.5.3 openpyxl: 2.3.2 xlrd: 1.0.0 xlwt: 1.1.2 xlsxwriter: 0.9.2 lxml: 3.6.0 bs4: None html5lib: 0.999999999 httplib2: 0.9.2 apiclient: 1.5.3 sqlalchemy: 1.0.13 pymysql: None psycopg2: None jinja2: 2.8 boto: 2.40.0 pandas_datareader: 0.2.1

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
wesmcommented, Mar 12, 2017

Converting lists of dictionaries faster in json_normalize seems perfectly reasonable. I intend to use RapidJSON (https://github.com/miloyip/nativejson-benchmark) to create a faster native JSON->DataFrame reader, since we can circumvent Python objects altogether that way. This can happen well before pandas2 ships by using Arrow tables an intermediary en route to pandas

1reaction
eewallacecommented, Oct 15, 2019

Not sure if this is still on anyone’s radar, but I’ve been dealing with a performance issue at least partly caused by json_normalize. From some profiling, it seems like the biggest problem for my case is the use of deepcopy. For common relatively simple cases of just dictionaries/lists of string and numeric literals, deepcopy seems like a lot of unnecessary overhead. Even if it’s needed for some use cases, calling it recursively (when it is doing its own recursive copy) is surely not optimal.

Read more comments on GitHub >

github_iconTop Results From Across the Web

pandas.io.json.json_normalize — pandas 0.21.1 documentation
Enter search terms or a module, class or function name. pandas.io.json.json_normalize¶. pandas.io.json. json_normalize ...
Read more >
Avoiding mistakes when working with json data in pandas
json_normalize does not recognize that dataScope contains json data, and will therefore produce the same result as pandas.read_json . The ...
Read more >
pandas.io.json.json_normalize with very nested json
json_normalize documentation, since it does exactly what I want it to do. I have been able to normalize part of it and now...
Read more >
All Pandas json_normalize() you should know for flattening ...
In this article, you'll learn how to use Pandas's built-in function json_normalize() to flatten those 2 types of JSON into Pandas DataFrames ...
Read more >
Issue while flattening the JSON file to CSV in RDP ESG Bulk
OrganizationName'] = pd.json_normalize(df_final['ESGOrganization. ... performance, targeting, social media, and for the proper functioning of websites.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found