Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

pd.to_numeric - float64, object or error?

See original GitHub issue

Code Sample, a copy-pastable example if possible

import pandas as pd

print pd.__version__

#different result example
d = pd.DataFrame({'a':[200, 300, '', 'NaN', 10000000000000000000]})

#returns dtype object
a = pd.to_numeric(d['a'], errors='coerce')
print a.dtype

#return dtype float64
b = d['a'].apply(pd.to_numeric, errors='coerce')
print b.dtype

#why not float64?
d = pd.DataFrame({'a':[200, 300, '', 'NaN', 30000000000000000000]})

#returns dtype object
a = pd.to_numeric(d['a'], errors='coerce')
print a.dtype

#returns OverflowError
b = d['a'].apply(pd.to_numeric, errors='coerce')
print b.dtype

Problem description

Hi guys, I realized that result of to_numeric changes depending on the way you pass a Series to that function. Please see example above. When I call to_numeric with series passed as parameter, it returns “object”, but when I apply to_numeric to that series, it returns “float64”. Moreover, I’m a bit confused what’s the correct behavior of to_numeric, why it doesn’t convert looooong int-like number to float64? It throws an exception from which I can’t even deduce which number (position, index) caused that exception…

I’m pretty sure my issue is being discussed somewhere already, I tried to search the proper issue but rather found bits and pieces about to_numeric and convertions in general. Please feel free to put my issue in more appropriate thread.

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 2.7.13.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 61 Stepping 4, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.20.3 pytest: None pip: 9.0.1 setuptools: 27.2.0 Cython: None numpy: 1.13.1 scipy: None xarray: None IPython: 5.4.1 sphinx: None patsy: None dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 0.999999999 sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: None

Issue Analytics

State:
Created 6 years ago
Comments:13 (12 by maintainers)

Top GitHub Comments

2reactions

chris-b1commented, Jul 18, 2017

I think that that errors='coerce' should force everything to float64 if anything is missing. It’s not fundamentally different than, our normal treatment of ints in the presence of missing data, and it’s an opt-in, deliberately lossy kwarg.

In [6]: pd.to_numeric([1, 2, ''], errors='coerce')
Out[6]: array([  1.,   2.,  nan])

That said, I do see your point that in the range of int64max to uint64max it becomes “lossier”

0reactions

jrebackcommented, Jul 19, 2017

I agree with @chris-b1 here.

It doesn’t matter what is there when we have errors='coerce'. If its not lowest common dtype convertible, then it gets a NaN. Now during the conversion you may be inferring and you see a uint64, then a NaN so we go back to object, but I agree that is buggy, it should return float64 if it has to coerce.

Top Results From Across the Web

When to apply(pd.to_numeric) and when to astype(np.float64 ...

first especially when the DataFrame column or series has the possibility of holding numbers that cannot be converted to Numeric, as it converts ......

pandas.to_numeric — pandas 1.5.2 documentation

Convert argument to a numeric type. The default return dtype is float64 or int64 depending on the data supplied. Use the downcast parameter...

How to Convert Strings to Floats in Pandas DataFrame

By setting errors='coerce', you'll transform the non-numeric values into NaN. Here it the complete code that you can use: import pandas as pd ......

10 tricks for converting Data to a Numeric Type in Pandas

df['mix_col'] = pd. to_numeric (df['mix_col'], errors='coerce' ). But when checking the dtypes , you will find it get converted to float64 .

Converting Dask DataFrame object columns to numbers with ...

to_numeric is customizable with different error behavior when values cannot be converted to numbers. You can coerce these values to NaN, raise ...