BUG: to_json with objects causing segfault
See original GitHub issueCode Sample, a copy-pastable example if possible
Creating an bson objectID, without giving an objectID exclusively is ok.
>>> import bson
>>> import pandas as pd
>>> pd.DataFrame({'A': [bson.objectid.ObjectId()]}).to_json()
Out[4]: '{"A":{"0":{"binary":"W\\u0e32\\u224cug\\u00fcR","generation_time":1474361586000}}}'
>>> pd.DataFrame({'A': [bson.objectid.ObjectId()], 'B': [1]}).to_json()
Out[5]: '{"A":{"0":{"binary":"W\\u0e4e\\u224cug\\u00fcS","generation_time":1474361614000}},"B":{"0":1}}'
However, if you provide an ID explicitly, an exception is raised
>>> pd.DataFrame({'A': [bson.objectid.ObjectId('574b4454ba8c5eb4f98a8f45')]}).to_json()
Traceback (most recent call last):
File "/auto/energymdl2/anaconda/envs/commod_20160831/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2885, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-7-c9a20090d481>", line 1, in <module>
pd.DataFrame({'A': [bson.objectid.ObjectId('574b4454ba8c5eb4f98a8f45')]}).to_json()
File "/auto/energymdl2/anaconda/envs/commod_20160831/lib/python2.7/site-packages/pandas/core/generic.py", line 1056, in to_json
default_handler=default_handler)
File "/auto/energymdl2/anaconda/envs/commod_20160831/lib/python2.7/site-packages/pandas/io/json.py", line 36, in to_json
date_unit=date_unit, default_handler=default_handler).write()
File "/auto/energymdl2/anaconda/envs/commod_20160831/lib/python2.7/site-packages/pandas/io/json.py", line 79, in write
default_handler=self.default_handler)
OverflowError: Unsupported UTF-8 sequence length when encoding string
And worse, if the column is not the only column, the entire process dies.
>>> pd.DataFrame({'A': [bson.objectid.ObjectId('574b4454ba8c5eb4f98a8f45')], 'B': [1]}).to_json()
Process finished with exit code 139
Expected Output
output of pd.show_versions()
pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 26.1.1
Cython: 0.24
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: 0.7.2
IPython: 4.1.2
sphinx: 1.3.5
patsy: 0.4.1
dateutil: 2.5.2
pytz: 2016.6.1
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5.2
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.6.0
bs4: 4.3.2
html5lib: 0.999
httplib2: 0.9.2
apiclient: 1.5.0
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: None
pymongo version is 3.3.0
Issue Analytics
- State:
- Created 7 years ago
- Comments:14 (8 by maintainers)
Top Results From Across the Web
segmentation fault reading json string - Stack Overflow
With your code, json_object_object_foreach receives a NULL pointer, causing the Segmentation Fault. Share.
Read more >Issue 33930: Segfault with deep recursion into object().__dir ...
The crash is likely caused by recursion during the clean-up of the object().__dir__.__dir__.... chain. The trashcan API (see ...
Read more >PHP 7 ChangeLog
Fixed bug #53580 (During resize gdImageCopyResampled cause colors change). Opcache: Fixed bug #81353 (segfault with preloading and statically bound closure).
Read more >Segmentation Fault after incrementing number, making a ...
https://github.com/bevinart/cJSON I am making a JSON interpreter. ... value of an object, it will result in a segmentation fault once i call ...
Read more >Who's at Fault: Tracking Down Segfaults in Production - Blog
In some cases, subtle bugs causing a program to crash due to a segfault may not always be obvious because of fault-tolerance built...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
When passing object dtypes which don’t actually contain strings (though they could also contain objects which have a good enough response to special methods to work), you must supply a
default_handler
.So the first 2 cases above are expected.
The 3rd is handled this way.
seg faulting shouldn’t happen though; we should get an exception that a
default_handler
is not supplied.@detroitcoder Do you have an example of a
default_handler
you use? I’m not clear how to implement one. I’d be fine with the handler just return an empty or static string.Edit - Sorry, I see now you can just use
df.to_json(orient='records', default_handler = str)