BUG: Integer column index breaks json roundtrip with orient=table
See original GitHub issuePandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
col1 = [1.0, 2.0, 3.5, 6.75]
col2 = [2.1, 3.1, 4.1, 5.1]
df = pd.DataFrame({1: col1, 2:col2}, index=[110, 112, 113, 121])
df.index.name = 'ID'
s = df.to_json(orient='table')
new = pd.read_json(s, orient='table')
Issue Description
The new
dataframe will become
1 2
ID
110 NaN NaN
112 NaN NaN
113 NaN NaN
121 NaN NaN
Expected Behavior
The expected dataframe would look like this:
1 2
ID
110 1.00 2.1
112 2.00 3.1
113 3.50 4.1
121 6.75 5.1
Changing to strings instead of integers in the column index will give the expected result:
col1 = [1.0, 2.0, 3.5, 6.75]
col2 = [2.1, 3.1, 4.1, 5.1]
df = pd.DataFrame({'1': col1, '2':col2}, index=[110, 112, 113, 121])
df.index.name = 'ID'
s = df.to_json(orient='table')
new = pd.read_json(s, orient='table')
Installed Versions
This crashed in my environment with the error assert '_distutils' in core.__file__, core.__file__
raised from lib/python3.9/site-packages/_distutils_hack/__init__.py", line 59, in ensure_local_distutils
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (5 by maintainers)
Top Results From Across the Web
BUG: Index type casting in read_json with orient='table' and ...
Problem description. Round trip should recover the original DataFrame. But the result index has been cast from float to integer.
Read more >Pandas read_json(orient="table") returns NaN if the column is ...
To work around the issue we can loop over the fields dataframe_table_schema.schema.fields and check if the field name is an integer if it...
Read more >IO tools (text, CSV, HDF5, …) — pandas 1.5.2 documentation
Any orient option that encodes to a JSON object will not preserve the ordering of index and column labels during round-trip serialization. If...
Read more >apache_beam.dataframe.io module - Apache Beam
If list-like, all elements must either be positional (i.e. integer indices into the document columns) or strings that correspond to column names provided ......
Read more >IO tools (text, CSV, HDF5, …) - Pandas 中文
Indicate number of NA values placed in non-numeric columns. ... In [290]: df.index.name = 'index' In [291]: df.to_json('test.json', orient='table') In ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@coatless I discussed a potential fix in https://github.com/pandas-dev/pandas/issues/46392#issuecomment-1242696492 but got no response as you can see 😕
@jmg-duarte did you end up solving the issue?
If not, @mroeschke could you suggest a way for @jmg-duarte to diff between summary runs? It’s not ideal that the JSON being generated is invalid.