PERF: json_normalize
See original GitHub issueI haven’t looked much at the implementation, but guessing simpler cases like this could be optimized.
In [63]: data = [
...: {'name': 'Name',
...: 'value': 1.0,
...: 'value2': 2.0,
...: 'nested': {'a': 'aa', 'b': 'bb'}}] * 1000000
In [64]: %timeit pd.DataFrame(data)
1 loop, best of 3: 847 ms per loop
In [65]: %timeit pd.io.json.json_normalize(data)
1 loop, best of 3: 20 s per loop
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.19.2 nose: 1.3.7 pip: 8.1.2 setuptools: 23.0.0 Cython: 0.24.1 numpy: 1.11.2 scipy: 0.18.1 statsmodels: 0.6.1 xarray: 0.8.2 IPython: 5.1.0 sphinx: 1.3.1 patsy: 0.4.1 dateutil: 2.5.3 pytz: 2016.4 blosc: None bottleneck: 1.1.0 tables: 3.2.2 numexpr: 2.6.1 matplotlib: 1.5.3 openpyxl: 2.3.2 xlrd: 1.0.0 xlwt: 1.1.2 xlsxwriter: 0.9.2 lxml: 3.6.0 bs4: None html5lib: 0.999999999 httplib2: 0.9.2 apiclient: 1.5.3 sqlalchemy: 1.0.13 pymysql: None psycopg2: None jinja2: 2.8 boto: 2.40.0 pandas_datareader: 0.2.1
Issue Analytics
- State:
- Created 7 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
pandas.io.json.json_normalize — pandas 0.21.1 documentation
Enter search terms or a module, class or function name. pandas.io.json.json_normalize¶. pandas.io.json. json_normalize ...
Read more >Avoiding mistakes when working with json data in pandas
json_normalize does not recognize that dataScope contains json data, and will therefore produce the same result as pandas.read_json . The ...
Read more >pandas.io.json.json_normalize with very nested json
json_normalize documentation, since it does exactly what I want it to do. I have been able to normalize part of it and now...
Read more >All Pandas json_normalize() you should know for flattening ...
In this article, you'll learn how to use Pandas's built-in function json_normalize() to flatten those 2 types of JSON into Pandas DataFrames ...
Read more >Issue while flattening the JSON file to CSV in RDP ESG Bulk
OrganizationName'] = pd.json_normalize(df_final['ESGOrganization. ... performance, targeting, social media, and for the proper functioning of websites.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Converting lists of dictionaries faster in
json_normalize
seems perfectly reasonable. I intend to use RapidJSON (https://github.com/miloyip/nativejson-benchmark) to create a faster native JSON->DataFrame reader, since we can circumvent Python objects altogether that way. This can happen well before pandas2 ships by using Arrow tables an intermediary en route to pandasNot sure if this is still on anyone’s radar, but I’ve been dealing with a performance issue at least partly caused by json_normalize. From some profiling, it seems like the biggest problem for my case is the use of deepcopy. For common relatively simple cases of just dictionaries/lists of string and numeric literals, deepcopy seems like a lot of unnecessary overhead. Even if it’s needed for some use cases, calling it recursively (when it is doing its own recursive copy) is surely not optimal.