BUG: pandas to_json with orient "table" returns wrong schema & data string
See original GitHub issue-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
test = pd.DataFrame([[1,2,3],[4,5,6]], columns=[1, 2, 3])
s = test.to_json(orient="table")
print(s)
# wrong string :
# '{"schema":{"fields":[{"name":"index","type":"integer"},{"name":1,"type":"integer"},{"name":2,"type":"integer"},{"name":3,"type":"integer"}],"primaryKey":["index"],"pandas_version":"0.20.0"},"data":[{"index":0,"1":1,"2":2,"3":3},{"index":1,"1":4,"2":5,"3":6}]}'
pd.read_json(s, orient="table")
#ValueError: Cannot convert non-finite values (NA or inf) to integer
Problem description
When the initial columns are integers, the schema dict returns correct names (that unquoted integers), but the data dict identifies columns as string (quoted integers). Therefore, any loaded dataframe from this json format will return a dataframe full of empty (NaN) values or fail with an exception (I don’t know which triggers which ; this minimal example here will trigger an exception ; my original dataset with multiindexes in stackoverflow returned an empty dataframe…
Expected Output
This output for pandas.to_json(orient=“table”) could be read (though it is losing the “int” label key and transforming it to strings) :
‘{“schema”:{“fields”:[{“name”:“index”,“type”:“integer”},{“name”:“1”,“type”:“integer”},{“name”:“2”,“type”:“integer”},{“name”:“3”,“type”:“integer”}],“primaryKey”:[“index”],“pandas_version”:“0.0.20”},“data”:[{“index”:0,“1”:1,“2”:2,“3”:3},{“index”:1,“1”:4,“2”:5,“3”:6}]}’
Output of pd.show_versions()
INSTALLED VERSIONS
commit : 67a3d4241ab84419856b84fc3ebc9abcbe66c6b3 python : 3.6.3.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19041 machine : AMD64 processor : Intel64 Family 6 Model 61 Stepping 4, GenuineIntel byteorder : little LC_ALL : None LANG : fr LOCALE : None.None
pandas : 1.1.4 numpy : 1.18.4 pytz : 2017.2 dateutil : 2.8.1 pip : 20.2.4 setuptools : 36.6.0 Cython : 0.27.2 pytest : 3.2.3 hypothesis : None sphinx : 1.6.5 blosc : 1.5.1 feather : 0.4.0 xlsxwriter : 1.0.2 lxml.etree : 4.1.0 html5lib : 0.9999999 pymysql : None psycopg2 : None jinja2 : 2.9.6 IPython : 6.2.1 pandas_datareader: None bs4 : 4.6.0 bottleneck : 1.2.1 fsspec : None fastparquet : None gcsfs : None matplotlib : 2.2.2 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 0.7.1 pytables : None pyxlsb : None s3fs : None scipy : 1.4.1 sqlalchemy : 1.1.14 tables : None tabulate : 0.8.5 xarray : 0.9.6 xlrd : 1.1.0 xlwt : None numba : 0.35.0
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:8 (4 by maintainers)
Top GitHub Comments
Is there any update on this? I ran into the same issue, heres a simple round trip to replicate the issue.
Output
For anyone else with the same issue, as a workaround I am casting columns names in the schema output to strings.
@Wolf-Byte updates happen when community folks push PRs
the core team can provide review