Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: pandas to_json with orient "table" returns wrong schema & data string

See original GitHub issue

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

test = pd.DataFrame([[1,2,3],[4,5,6]], columns=[1, 2, 3])
s = test.to_json(orient="table")
print(s)
# wrong string :
# '{"schema":{"fields":[{"name":"index","type":"integer"},{"name":1,"type":"integer"},{"name":2,"type":"integer"},{"name":3,"type":"integer"}],"primaryKey":["index"],"pandas_version":"0.20.0"},"data":[{"index":0,"1":1,"2":2,"3":3},{"index":1,"1":4,"2":5,"3":6}]}'
pd.read_json(s, orient="table")
#ValueError: Cannot convert non-finite values (NA or inf) to integer

Problem description

When the initial columns are integers, the schema dict returns correct names (that unquoted integers), but the data dict identifies columns as string (quoted integers). Therefore, any loaded dataframe from this json format will return a dataframe full of empty (NaN) values or fail with an exception (I don’t know which triggers which ; this minimal example here will trigger an exception ; my original dataset with multiindexes in stackoverflow returned an empty dataframe…

Expected Output

This output for pandas.to_json(orient=“table”) could be read (though it is losing the “int” label key and transforming it to strings) :

‘{“schema”:{“fields”:[{“name”:“index”,“type”:“integer”},{“name”:“1”,“type”:“integer”},{“name”:“2”,“type”:“integer”},{“name”:“3”,“type”:“integer”}],“primaryKey”:[“index”],“pandas_version”:“0.0.20”},“data”:[{“index”:0,“1”:1,“2”:2,“3”:3},{“index”:1,“1”:4,“2”:5,“3”:6}]}’

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : 67a3d4241ab84419856b84fc3ebc9abcbe66c6b3 python : 3.6.3.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19041 machine : AMD64 processor : Intel64 Family 6 Model 61 Stepping 4, GenuineIntel byteorder : little LC_ALL : None LANG : fr LOCALE : None.None

pandas : 1.1.4 numpy : 1.18.4 pytz : 2017.2 dateutil : 2.8.1 pip : 20.2.4 setuptools : 36.6.0 Cython : 0.27.2 pytest : 3.2.3 hypothesis : None sphinx : 1.6.5 blosc : 1.5.1 feather : 0.4.0 xlsxwriter : 1.0.2 lxml.etree : 4.1.0 html5lib : 0.9999999 pymysql : None psycopg2 : None jinja2 : 2.9.6 IPython : 6.2.1 pandas_datareader: None bs4 : 4.6.0 bottleneck : 1.2.1 fsspec : None fastparquet : None gcsfs : None matplotlib : 2.2.2 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 0.7.1 pytables : None pyxlsb : None s3fs : None scipy : 1.4.1 sqlalchemy : 1.1.14 tables : None tabulate : 0.8.5 xarray : 0.9.6 xlrd : 1.1.0 xlwt : None numba : 0.35.0

Issue Analytics

State:
Created 3 years ago
Reactions:2
Comments:8 (4 by maintainers)

Top GitHub Comments

2reactions

Wolf-Bytecommented, Nov 29, 2021

Is there any update on this? I ran into the same issue, heres a simple round trip to replicate the issue.

import pandas as pd

# List
arr = ["123"]

# Create the dataframe
dataframe = pd.DataFrame(arr)
print(dataframe)

# Get the table as a schema
dataframe_schema = dataframe.to_json(orient='table')
print(dataframe_schema)

# Load the DataFrame from the json object
dataframe = pd.read_json(dataframe_schema, orient='table')
print(dataframe)

Output

     0
0  123

{"schema": {"fields":[{"name":"index","type":"integer"},{"name":0,"type":"string"}],"primaryKey":["index"],"pandas_version":"0.20.0"}, "data": [{"index":0,"0":"123"}]}

     0
0  NaN

For anyone else with the same issue, as a workaround I am casting columns names in the schema output to strings.

import pandas as pd

# List
arr = ["123"]

# Create the dataframe
dataframe = pd.DataFrame(arr)
print(dataframe)

# Get the table as a schema
dataframe_schema = json.loads(dataframe.to_json(orient='table'))

# BUG FIX: Loop over the schema fields
for field in dataframe_schema.get("schema").get("fields"): 
    # Cast the column name to a string
    field["name"] = str(field.get("name"))
    
# Dump the object to a string
dataframe_schema_str = json.dumps(dataframe_schema)
print(dataframe_schema_str)

dataframe = pd.read_json(dataframe_schema_str, orient='table')
print(dataframe)

0reactions

jrebackcommented, Nov 29, 2021

@Wolf-Byte updates happen when community folks push PRs

the core team can provide review

Top Results From Across the Web

pandas read_json with orient="table" - Stack Overflow

If the example json string is generated by pandas to_json , it is generating a wrong schema for integer column name. Share.

pandas.DataFrame.to_json — pandas 1.2.5 documentation

File path or object. If not specified, the result is returned as a string. orientstr. Indication of expected JSON string format.

pandas.read_json — pandas 1.5.2 documentation

Convert a Series to a JSON string. json_normalize. Normalize semi-structured JSON data into a flat table. Notes. Specific to ...

pandas.DataFrame.to_json — pandas 1.5.2 documentation

Whether to include the index values in the JSON string. Not including the index ( index=False ) is only supported when orient is...

v0.20.1 (May 5, 2017) — pandas 1.0.0 documentation

A new orient for JSON serialization, orient='table' , that uses the Table Schema ... generate a Table Schema compatible string representation of the...