BUG: OverflowError on to_json with numbers larger than sys.maxsize
See original GitHub issue- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- (optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
import sys
from pandas.io.json import dumps
dumps(sys.maxsize)
dumps(sys.maxsize + 1)
Problem description
The Pandas JSON dumper doesn’t seem to handle number values larger than sys.maxsize
(a word). I have a dataframe that I’m trying to write to_json, but it’s failing with OverflowError: int too big to convert
. There are some numbers larger than 9223372036854775807
in it.
Passing a default_handler
doesn’t help. It doesn’t get called for the error.
>>> dumps(sys.maxsize)
'9223372036854775807'
>>> dumps(sys.maxsize + 1, default_handler=str)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OverflowError: int too big to convert
Expected Output
Python’s built-in json module handles large numbers without issues.
>>> import json
>>> json.dumps(sys.maxsize)
'9223372036854775807'
>>> json.dumps(sys.maxsize+1)
'9223372036854775808'
I expect Pandas to be able to output large numbers to JSON. An option to use the built-in json
module instead of ujson
would be fine.
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None python : 3.8.3.final.0 python-bits : 64 OS : Linux OS-release : 4.19.76-linuxkit machine : x86_64 processor : byteorder : little LC_ALL : None LANG : C.UTF-8 LOCALE : en_US.UTF-8
pandas : 1.0.3 numpy : 1.18.2 pytz : 2019.3 dateutil : 2.8.1 pip : 20.1.1 setuptools : 46.4.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 2.11.2 IPython : None pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 0.16.0 pytables : None pytest : None pyxlsb : None s3fs : 0.4.2 scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : None numba : None
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (7 by maintainers)
Top GitHub Comments
Related to #20599 this isn’t really feasible to do in the ujson source so would probably have to catch and coerce to a serializable type
Yea if you want to add a test case and submit a pull request we can go from there. Will also want to check the performance benchmarks for JSON which you’ll find more info on here
https://pandas.pydata.org/pandas-docs/stable/development/contributing.html#running-the-performance-test-suite