question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: OverflowError on to_json with numbers larger than sys.maxsize

See original GitHub issue
  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandas.
  • (optional) I have confirmed this bug exists on the master branch of pandas.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import sys
from pandas.io.json import dumps

dumps(sys.maxsize)
dumps(sys.maxsize + 1)

Problem description

The Pandas JSON dumper doesn’t seem to handle number values larger than sys.maxsize (a word). I have a dataframe that I’m trying to write to_json, but it’s failing with OverflowError: int too big to convert. There are some numbers larger than 9223372036854775807 in it.

Passing a default_handler doesn’t help. It doesn’t get called for the error.

>>> dumps(sys.maxsize)
'9223372036854775807'
>>> dumps(sys.maxsize + 1, default_handler=str)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: int too big to convert

Expected Output

Python’s built-in json module handles large numbers without issues.

>>> import json
>>> json.dumps(sys.maxsize)
'9223372036854775807'
>>> json.dumps(sys.maxsize+1)
'9223372036854775808'

I expect Pandas to be able to output large numbers to JSON. An option to use the built-in json module instead of ujson would be fine.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None python : 3.8.3.final.0 python-bits : 64 OS : Linux OS-release : 4.19.76-linuxkit machine : x86_64 processor : byteorder : little LC_ALL : None LANG : C.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.0.3 numpy : 1.18.2 pytz : 2019.3 dateutil : 2.8.1 pip : 20.1.1 setuptools : 46.4.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 2.11.2 IPython : None pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 0.16.0 pytables : None pytest : None pyxlsb : None s3fs : 0.4.2 scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : None numba : None

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
WillAydcommented, May 27, 2020

Related to #20599 this isn’t really feasible to do in the ujson source so would probably have to catch and coerce to a serializable type

0reactions
WillAydcommented, May 28, 2020

Yea if you want to add a test case and submit a pull request we can go from there. Will also want to check the performance benchmarks for JSON which you’ll find more info on here

https://pandas.pydata.org/pandas-docs/stable/development/contributing.html#running-the-performance-test-suite

Read more comments on GitHub >

github_iconTop Results From Across the Web

Python int too large to convert to C long" on windows but not ...
You'll get that error once your numbers are greater than sys.maxsize : ... <module> OverflowError: Python int too large to convert to C...
Read more >
sys.maxsize() in Python - GeeksforGeeks
maxsize attribute of the sys module fetches the largest value a variable ... Example 3: Trying to create a list with a size...
Read more >
How to Make an Integer Larger Than Any Other ... - Finxter
In Python 3 sys.maxsize can be used as an integer larger than any other practical list or string index in a program. The...
Read more >
Handling CSV files with wide columns in Python
Error : field larger than field limit (131072) ... field_size_limit = sys.maxsize while True: try: csv_std.field_size_limit(field_size_limit) ...
Read more >
What's new in 1.1.0 (July 28, 2020) - Pandas
Now the error message also includes a list of the missing labels (max 10 ... was raising an OverflowError with numbers larger than...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found