Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

PERF: json_normalize

See original GitHub issue

I haven’t looked much at the implementation, but guessing simpler cases like this could be optimized.

In [63]: data = [
    ...:     {'name': 'Name',
    ...:      'value': 1.0,
    ...:      'value2': 2.0,
    ...:      'nested': {'a': 'aa', 'b': 'bb'}}] * 1000000

In [64]: %timeit pd.DataFrame(data)
1 loop, best of 3: 847 ms per loop

In [65]: %timeit pd.io.json.json_normalize(data)
1 loop, best of 3: 20 s per loop

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.19.2 nose: 1.3.7 pip: 8.1.2 setuptools: 23.0.0 Cython: 0.24.1 numpy: 1.11.2 scipy: 0.18.1 statsmodels: 0.6.1 xarray: 0.8.2 IPython: 5.1.0 sphinx: 1.3.1 patsy: 0.4.1 dateutil: 2.5.3 pytz: 2016.4 blosc: None bottleneck: 1.1.0 tables: 3.2.2 numexpr: 2.6.1 matplotlib: 1.5.3 openpyxl: 2.3.2 xlrd: 1.0.0 xlwt: 1.1.2 xlsxwriter: 0.9.2 lxml: 3.6.0 bs4: None html5lib: 0.999999999 httplib2: 0.9.2 apiclient: 1.5.3 sqlalchemy: 1.0.13 pymysql: None psycopg2: None jinja2: 2.8 boto: 2.40.0 pandas_datareader: 0.2.1

Issue Analytics

State:
Created 7 years ago
Comments:6 (4 by maintainers)

Top GitHub Comments

2reactions

wesmcommented, Mar 12, 2017

Converting lists of dictionaries faster in json_normalize seems perfectly reasonable. I intend to use RapidJSON (https://github.com/miloyip/nativejson-benchmark) to create a faster native JSON->DataFrame reader, since we can circumvent Python objects altogether that way. This can happen well before pandas2 ships by using Arrow tables an intermediary en route to pandas

1reaction

eewallacecommented, Oct 15, 2019

Not sure if this is still on anyone’s radar, but I’ve been dealing with a performance issue at least partly caused by json_normalize. From some profiling, it seems like the biggest problem for my case is the use of deepcopy. For common relatively simple cases of just dictionaries/lists of string and numeric literals, deepcopy seems like a lot of unnecessary overhead. Even if it’s needed for some use cases, calling it recursively (when it is doing its own recursive copy) is surely not optimal.

Top Results From Across the Web

pandas.io.json.json_normalize — pandas 0.21.1 documentation

Enter search terms or a module, class or function name. pandas.io.json.json_normalize¶. pandas.io.json. json_normalize ...

Avoiding mistakes when working with json data in pandas

json_normalize does not recognize that dataScope contains json data, and will therefore produce the same result as pandas.read_json . The ...

pandas.io.json.json_normalize with very nested json

json_normalize documentation, since it does exactly what I want it to do. I have been able to normalize part of it and now...

All Pandas json_normalize() you should know for flattening ...

In this article, you'll learn how to use Pandas's built-in function json_normalize() to flatten those 2 types of JSON into Pandas DataFrames ...

Issue while flattening the JSON file to CSV in RDP ESG Bulk

OrganizationName'] = pd.json_normalize(df_final['ESGOrganization. ... performance, targeting, social media, and for the proper functioning of websites.