pd.to_numeric - float64, object or error?
See original GitHub issueCode Sample, a copy-pastable example if possible
import pandas as pd
print pd.__version__
#different result example
d = pd.DataFrame({'a':[200, 300, '', 'NaN', 10000000000000000000]})
#returns dtype object
a = pd.to_numeric(d['a'], errors='coerce')
print a.dtype
#return dtype float64
b = d['a'].apply(pd.to_numeric, errors='coerce')
print b.dtype
#why not float64?
d = pd.DataFrame({'a':[200, 300, '', 'NaN', 30000000000000000000]})
#returns dtype object
a = pd.to_numeric(d['a'], errors='coerce')
print a.dtype
#returns OverflowError
b = d['a'].apply(pd.to_numeric, errors='coerce')
print b.dtype
Problem description
Hi guys, I realized that result of to_numeric changes depending on the way you pass a Series to that function. Please see example above. When I call to_numeric with series passed as parameter, it returns “object”, but when I apply to_numeric to that series, it returns “float64”. Moreover, I’m a bit confused what’s the correct behavior of to_numeric, why it doesn’t convert looooong int-like number to float64? It throws an exception from which I can’t even deduce which number (position, index) caused that exception…
I’m pretty sure my issue is being discussed somewhere already, I tried to search the proper issue but rather found bits and pieces about to_numeric and convertions in general. Please feel free to put my issue in more appropriate thread.
Output of pd.show_versions()
pandas: 0.20.3 pytest: None pip: 9.0.1 setuptools: 27.2.0 Cython: None numpy: 1.13.1 scipy: None xarray: None IPython: 5.4.1 sphinx: None patsy: None dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 0.999999999 sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: None
Issue Analytics
- State:
- Created 6 years ago
- Comments:13 (12 by maintainers)
Top GitHub Comments
I think that that
errors='coerce'
should force everything to float64 if anything is missing. It’s not fundamentally different than, our normal treatment of ints in the presence of missing data, and it’s an opt-in, deliberately lossy kwarg.That said, I do see your point that in the range of int64max to uint64max it becomes “lossier”
I agree with @chris-b1 here.
It doesn’t matter what is there when we have
errors='coerce'
. If its not lowest common dtype convertible, then it gets a NaN. Now during the conversion you may be inferring and you see a uint64, then a NaN so we go back toobject
, but I agree that is buggy, it should returnfloat64
if it has to coerce.