Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: OverflowError on to_json with numbers larger than sys.maxsize

See original GitHub issue

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import sys
from pandas.io.json import dumps

dumps(sys.maxsize)
dumps(sys.maxsize + 1)

Problem description

The Pandas JSON dumper doesn’t seem to handle number values larger than sys.maxsize (a word). I have a dataframe that I’m trying to write to_json, but it’s failing with OverflowError: int too big to convert. There are some numbers larger than 9223372036854775807 in it.

Passing a default_handler doesn’t help. It doesn’t get called for the error.

>>> dumps(sys.maxsize)
'9223372036854775807'
>>> dumps(sys.maxsize + 1, default_handler=str)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: int too big to convert

Expected Output

Python’s built-in json module handles large numbers without issues.

>>> import json
>>> json.dumps(sys.maxsize)
'9223372036854775807'
>>> json.dumps(sys.maxsize+1)
'9223372036854775808'

I expect Pandas to be able to output large numbers to JSON. An option to use the built-in json module instead of ujson would be fine.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None python : 3.8.3.final.0 python-bits : 64 OS : Linux OS-release : 4.19.76-linuxkit machine : x86_64 processor : byteorder : little LC_ALL : None LANG : C.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.0.3 numpy : 1.18.2 pytz : 2019.3 dateutil : 2.8.1 pip : 20.1.1 setuptools : 46.4.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 2.11.2 IPython : None pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 0.16.0 pytables : None pytest : None pyxlsb : None s3fs : 0.4.2 scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : None numba : None

Issue Analytics

State:
Created 3 years ago
Comments:8 (7 by maintainers)

Top GitHub Comments

1reaction

WillAydcommented, May 27, 2020

Related to #20599 this isn’t really feasible to do in the ujson source so would probably have to catch and coerce to a serializable type

0reactions

WillAydcommented, May 28, 2020

Yea if you want to add a test case and submit a pull request we can go from there. Will also want to check the performance benchmarks for JSON which you’ll find more info on here

https://pandas.pydata.org/pandas-docs/stable/development/contributing.html#running-the-performance-test-suite