to_dict() on a boolean series sometimes returns numpy types instead of Python types
See original GitHub issueProblem description
I construct a Series
in several ways that should give the same output from to_dict()
, but instead I get different output types. In my case, this breaks downstream JSON serializers.
The code sample below includes cases with correct output (bool
) and incorrect (numpy.bool_
) – see inline comments.
Related issues, though none seem exactly the same: #13258, #13830, #16048, #17491, #19381, #20791, #23753, #23921, #24908, #25969
Code sample
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({ 'a': [True, False], 'b': [0, 1]} )
In [3]: df
Out[3]:
a b
0 True 0
1 False 1
In [27]: type(df['a'].iloc[0])
Out[27]: numpy.bool_
In [48]: type(df[['a']].iloc[0, 0])
Out[48]: numpy.bool_
In [33]: type(df.iloc[0,0])
Out[33]: numpy.bool_
In [24]: type(df.iloc[0]['a'])
Out[24]: numpy.bool_
# ----
In [4]: df[['a']].iloc[0].to_dict()
Out[4]: {'a': True}
# correct
In [5]: type(df[['a']].iloc[0].to_dict()['a'])
Out[5]: bool
In [6]: df.iloc[0][['a']].to_dict()
Out[6]: {'a': True}
# this one is incorrect, should return bool
In [7]: type(df.iloc[0][['a']].to_dict()['a'])
Out[7]: numpy.bool_
# ----
In [8]: df[['a', 'b']].to_dict(orient='records')[0]
Out[8]: {'a': True, 'b': 0}
# correct
In [9]: type(df[['a', 'b']].to_dict(orient='records')[0]['a'])
Out[9]: bool
In [10]: df[['a', 'b']].iloc[0].to_dict()
Out[10]: {'a': True, 'b': 0}
# this one is incorrect, should return bool
In [11]: type(df[['a', 'b']].iloc[0].to_dict()['a'])
Out[11]: numpy.bool_
This may explain what’s going on:
In [54]: df.iloc[0][['a']]
Out[54]:
a True
Name: 0, dtype: object
In [56]: df[['a']].iloc[0]
Out[56]:
a True
Name: 0, dtype: bool
That relates to #25969, where @mroeschke commented about a similar dtype discrepancy:
This probably occurs because
s2
is object dtype and it’s trying to preserve the dtype of each input argument while the arguments ins1
can both be coerced toint64
.
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Darwin
OS-release : 18.6.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 0.25.0
numpy : 1.16.4
pytz : 2019.1
dateutil : 2.8.0
pip : 19.0.3
setuptools : 40.8.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.6.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : 2.6.9
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : None
sqlalchemy : None
tables : 3.5.2
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None
Issue Analytics
- State:
- Created 4 years ago
- Reactions:2
- Comments:8 (5 by maintainers)
Top Results From Across the Web
python - pandas Series.to_dict() type conversion from numpy ...
I'm not sure if this is design intent and my expectations are wrong but it seems to me that the type conversion of...
Read more >What's New — pandas 0.23.4 documentation - PyData |
Bug in concat() where error was raised in concatenating Series with numpy scalar ... of Series/Index will now return Python scalars; Indexing with...
Read more >The pandas DataFrame Object - Cheat Sheet
same data type. Series arithmetic is ... Note: methods returning a series default to work on cols ... Trap: when adding a python...
Read more >NumPy quickstart — NumPy v1.24 Manual
One can create or specify dtype's using standard Python types. ... TypeError: array() takes from 1 to 2 positional arguments but 4 were...
Read more >Python for Data Analysis, 3E - 5 Getting Started with pandas
Series. A Series is a one-dimensional array-like object containing a sequence of values (of similar types to NumPy types) of the same type...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Not sure if this is the expected behavior, but it might be an underlying bug/undocumented feature in numpy:
Sorry for my delay here. I followed up on https://github.com/numpy/numpy/issues/14139 — the numpy folks suggest reliably converting to vanilla python types as follows:
Do you believe pandas should account for this edge case and call
item()
recursively? I think the use case of passingto_dict()
outputs to serializers is fairly common.