question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: to_json with objects causing segfault

See original GitHub issue

Code Sample, a copy-pastable example if possible

Creating an bson objectID, without giving an objectID exclusively is ok.

>>> import bson
>>> import pandas as pd
>>> pd.DataFrame({'A': [bson.objectid.ObjectId()]}).to_json()
Out[4]: '{"A":{"0":{"binary":"W\\u0e32\\u224cug\\u00fcR","generation_time":1474361586000}}}'
>>> pd.DataFrame({'A': [bson.objectid.ObjectId()], 'B': [1]}).to_json()
Out[5]: '{"A":{"0":{"binary":"W\\u0e4e\\u224cug\\u00fcS","generation_time":1474361614000}},"B":{"0":1}}'

However, if you provide an ID explicitly, an exception is raised

>>> pd.DataFrame({'A': [bson.objectid.ObjectId('574b4454ba8c5eb4f98a8f45')]}).to_json()
Traceback (most recent call last):
  File "/auto/energymdl2/anaconda/envs/commod_20160831/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2885, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-7-c9a20090d481>", line 1, in <module>
    pd.DataFrame({'A': [bson.objectid.ObjectId('574b4454ba8c5eb4f98a8f45')]}).to_json()
  File "/auto/energymdl2/anaconda/envs/commod_20160831/lib/python2.7/site-packages/pandas/core/generic.py", line 1056, in to_json
    default_handler=default_handler)
  File "/auto/energymdl2/anaconda/envs/commod_20160831/lib/python2.7/site-packages/pandas/io/json.py", line 36, in to_json
    date_unit=date_unit, default_handler=default_handler).write()
  File "/auto/energymdl2/anaconda/envs/commod_20160831/lib/python2.7/site-packages/pandas/io/json.py", line 79, in write
    default_handler=self.default_handler)
OverflowError: Unsupported UTF-8 sequence length when encoding string

And worse, if the column is not the only column, the entire process dies.

>>> pd.DataFrame({'A': [bson.objectid.ObjectId('574b4454ba8c5eb4f98a8f45')], 'B': [1]}).to_json()
Process finished with exit code 139

Expected Output

output of pd.show_versions()

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 26.1.1
Cython: 0.24
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: 0.7.2
IPython: 4.1.2
sphinx: 1.3.5
patsy: 0.4.1
dateutil: 2.5.2
pytz: 2016.6.1
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5.2
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.6.0
bs4: 4.3.2
html5lib: 0.999
httplib2: 0.9.2
apiclient: 1.5.0
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: None

pymongo version is 3.3.0

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:14 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
jrebackcommented, Sep 20, 2016

When passing object dtypes which don’t actually contain strings (though they could also contain objects which have a good enough response to special methods to work), you must supply a default_handler.

So the first 2 cases above are expected.

The 3rd is handled this way.

In [6]: pd.DataFrame({'A': [bson.objectid.ObjectId('574b4454ba8c5eb4f98a8f45')]}).to_json(default_handler=str)
Out[6]: '{"A":{"0":"574b4454ba8c5eb4f98a8f45"}}'

seg faulting shouldn’t happen though; we should get an exception that a default_handler is not supplied.

0reactions
mike-seekwellcommented, Mar 21, 2018

@detroitcoder Do you have an example of a default_handler you use? I’m not clear how to implement one. I’d be fine with the handler just return an empty or static string.

Edit - Sorry, I see now you can just use df.to_json(orient='records', default_handler = str)

Read more comments on GitHub >

github_iconTop Results From Across the Web

segmentation fault reading json string - Stack Overflow
With your code, json_object_object_foreach receives a NULL pointer, causing the Segmentation Fault. Share.
Read more >
Issue 33930: Segfault with deep recursion into object().__dir ...
The crash is likely caused by recursion during the clean-up of the object().__dir__.__dir__.... chain. The trashcan API (see ...
Read more >
PHP 7 ChangeLog
Fixed bug #53580 (During resize gdImageCopyResampled cause colors change). Opcache: Fixed bug #81353 (segfault with preloading and statically bound closure).
Read more >
Segmentation Fault after incrementing number, making a ...
https://github.com/bevinart/cJSON I am making a JSON interpreter. ... value of an object, it will result in a segmentation fault once i call ...
Read more >
Who's at Fault: Tracking Down Segfaults in Production - Blog
In some cases, subtle bugs causing a program to crash due to a segfault may not always be obvious because of fault-tolerance built...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found