question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: pandas to_json with orient "table" returns wrong schema & data string

See original GitHub issue
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

test = pd.DataFrame([[1,2,3],[4,5,6]], columns=[1, 2, 3])
s = test.to_json(orient="table")
print(s)
# wrong string :
# '{"schema":{"fields":[{"name":"index","type":"integer"},{"name":1,"type":"integer"},{"name":2,"type":"integer"},{"name":3,"type":"integer"}],"primaryKey":["index"],"pandas_version":"0.20.0"},"data":[{"index":0,"1":1,"2":2,"3":3},{"index":1,"1":4,"2":5,"3":6}]}'
pd.read_json(s, orient="table")
#ValueError: Cannot convert non-finite values (NA or inf) to integer

Problem description

When the initial columns are integers, the schema dict returns correct names (that unquoted integers), but the data dict identifies columns as string (quoted integers). Therefore, any loaded dataframe from this json format will return a dataframe full of empty (NaN) values or fail with an exception (I don’t know which triggers which ; this minimal example here will trigger an exception ; my original dataset with multiindexes in stackoverflow returned an empty dataframe…

Expected Output

This output for pandas.to_json(orient=“table”) could be read (though it is losing the “int” label key and transforming it to strings) :

‘{“schema”:{“fields”:[{“name”:“index”,“type”:“integer”},{“name”:“1”,“type”:“integer”},{“name”:“2”,“type”:“integer”},{“name”:“3”,“type”:“integer”}],“primaryKey”:[“index”],“pandas_version”:“0.0.20”},“data”:[{“index”:0,“1”:1,“2”:2,“3”:3},{“index”:1,“1”:4,“2”:5,“3”:6}]}’

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 67a3d4241ab84419856b84fc3ebc9abcbe66c6b3 python : 3.6.3.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19041 machine : AMD64 processor : Intel64 Family 6 Model 61 Stepping 4, GenuineIntel byteorder : little LC_ALL : None LANG : fr LOCALE : None.None

pandas : 1.1.4 numpy : 1.18.4 pytz : 2017.2 dateutil : 2.8.1 pip : 20.2.4 setuptools : 36.6.0 Cython : 0.27.2 pytest : 3.2.3 hypothesis : None sphinx : 1.6.5 blosc : 1.5.1 feather : 0.4.0 xlsxwriter : 1.0.2 lxml.etree : 4.1.0 html5lib : 0.9999999 pymysql : None psycopg2 : None jinja2 : 2.9.6 IPython : 6.2.1 pandas_datareader: None bs4 : 4.6.0 bottleneck : 1.2.1 fsspec : None fastparquet : None gcsfs : None matplotlib : 2.2.2 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 0.7.1 pytables : None pyxlsb : None s3fs : None scipy : 1.4.1 sqlalchemy : 1.1.14 tables : None tabulate : 0.8.5 xarray : 0.9.6 xlrd : 1.1.0 xlwt : None numba : 0.35.0

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:2
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
Wolf-Bytecommented, Nov 29, 2021

Is there any update on this? I ran into the same issue, heres a simple round trip to replicate the issue.

import pandas as pd

# List
arr = ["123"]

# Create the dataframe
dataframe = pd.DataFrame(arr)
print(dataframe)

# Get the table as a schema
dataframe_schema = dataframe.to_json(orient='table')
print(dataframe_schema)

# Load the DataFrame from the json object
dataframe = pd.read_json(dataframe_schema, orient='table')
print(dataframe)

Output

     0
0  123

{"schema": {"fields":[{"name":"index","type":"integer"},{"name":0,"type":"string"}],"primaryKey":["index"],"pandas_version":"0.20.0"}, "data": [{"index":0,"0":"123"}]}

     0
0  NaN

For anyone else with the same issue, as a workaround I am casting columns names in the schema output to strings.

import pandas as pd

# List
arr = ["123"]

# Create the dataframe
dataframe = pd.DataFrame(arr)
print(dataframe)

# Get the table as a schema
dataframe_schema = json.loads(dataframe.to_json(orient='table'))

# BUG FIX: Loop over the schema fields
for field in dataframe_schema.get("schema").get("fields"): 
    # Cast the column name to a string
    field["name"] = str(field.get("name"))
    
# Dump the object to a string
dataframe_schema_str = json.dumps(dataframe_schema)
print(dataframe_schema_str)

dataframe = pd.read_json(dataframe_schema_str, orient='table')
print(dataframe)
0reactions
jrebackcommented, Nov 29, 2021

@Wolf-Byte updates happen when community folks push PRs

the core team can provide review

Read more comments on GitHub >

github_iconTop Results From Across the Web

pandas read_json with orient="table" - Stack Overflow
If the example json string is generated by pandas to_json , it is generating a wrong schema for integer column name. Share.
Read more >
pandas.DataFrame.to_json — pandas 1.2.5 documentation
File path or object. If not specified, the result is returned as a string. orientstr. Indication of expected JSON string format.
Read more >
pandas.read_json — pandas 1.5.2 documentation
Convert a Series to a JSON string. json_normalize. Normalize semi-structured JSON data into a flat table. Notes. Specific to ...
Read more >
pandas.DataFrame.to_json — pandas 1.5.2 documentation
Whether to include the index values in the JSON string. Not including the index ( index=False ) is only supported when orient is...
Read more >
v0.20.1 (May 5, 2017) — pandas 1.0.0 documentation
A new orient for JSON serialization, orient='table' , that uses the Table Schema ... generate a Table Schema compatible string representation of the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found