question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

I case of dtype issues, read_csv doesn't give an error as useful as pd.to_numeric does

See original GitHub issue

A follow-up to #13237 . Copied examples: Here’s what to_numeric shows:

In [137]: pd.to_numeric(o)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
pandas/src/inference.pyx in pandas.lib.maybe_convert_numeric (pandas/lib.c:55708)()

ValueError: Unable to parse string "52721156299871854681072370812488856336599863860274272781560003496046130816295143841376557767523688876482371118681808060318819187029855011862637267061548684520491431327693943042785705302632892888308342787190965977140539558800921069356177102235514666302335984730634641934384020650987601849970936578094137344.00000"

And here’s what read_csv shows (the data is at ftp://ftp.sanger.ac.uk/pub/consortia/ibdgenetics/iibdgc-trans-ancestry-summary-stats.tar):

In [138]: d = pd.read_csv('EUR.UC.gwas.assoc', delim_whitespace=True, usecols=['OR'], dtype={'OR': np.float64})
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
pandas/parser.pyx in pandas.parser.TextReader._convert_tokens (pandas/parser.c:14411)()

TypeError: Cannot cast array from dtype('O') to dtype('float64') according to the rule 'safe'

During handling of the above exception, another exception occurred:

[... long stacktrace ...]

pandas/parser.pyx in pandas.parser.TextReader._convert_tokens (pandas/parser.c:14632)()

ValueError: cannot safely convert passed user dtype of float64 for object dtyped data in column 8

@jreback I’ve finally started looking into it, and it seems that I can’t implement it in a good way without changing NumPy because, in the end, it’s NumPy who doesn’t give any row/value information, albeit Pandas conditionally changes the exception to its own.

I can write an ad-hoc implementation for numeric conversion using pd.to_numeric though, and use its row/value information in case it raises an exception. What do you think?

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

4reactions
simonjayhawkinscommented, Sep 6, 2018

simpler sample code example:

>>> import pandas as pd
>>>
>>> pd.to_numeric('1.7976931348623157e+308')
Traceback (most recent call last):
  File "pandas/_libs/src\inference.pyx", line 1152, in pandas._libs.lib.maybe_convert_numeric
ValueError: Unable to parse string "1.7976931348623157e+308"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\simon\Anaconda3\lib\site-packages\pandas\core\tools\numeric.py", line 133, in to_numeric
    coerce_numeric=coerce_numeric)
  File "pandas/_libs/src\inference.pyx", line 1185, in pandas._libs.lib.maybe_convert_numeric
ValueError: Unable to parse string "1.7976931348623157e+308" at position 0
>>> from io import StringIO
>>>
>>> import numpy as np
>>> import pandas as pd
>>>
>>> csv_str = StringIO(('1.7976931348623157e+308, 1.7976931348623157e+308 '))
>>> df = pd.read_csv(csv_str, engine='c', names=["a", "b"], dtype={
...                  "a": np.str, "b": np.float64})
Traceback (most recent call last):
  File "pandas\_libs\parsers.pyx", line 1156, in pandas._libs.parsers.TextReader._convert_tokens
TypeError: Cannot cast array from dtype('O') to dtype('float64') according to the rule 'safe'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "C:\Users\simon\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 678, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "C:\Users\simon\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 446, in _read
    data = parser.read(nrows)
  File "C:\Users\simon\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1036, in read
    ret = self._engine.read(nrows)
  File "C:\Users\simon\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1848, in read
    data = self._reader.read(nrows)
  File "pandas\_libs\parsers.pyx", line 876, in pandas._libs.parsers.TextReader.read
  File "pandas\_libs\parsers.pyx", line 891, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas\_libs\parsers.pyx", line 968, in pandas._libs.parsers.TextReader._read_rows
  File "pandas\_libs\parsers.pyx", line 1094, in pandas._libs.parsers.TextReader._convert_column_data
  File "pandas\_libs\parsers.pyx", line 1164, in pandas._libs.parsers.TextReader._convert_tokens
ValueError: cannot safely convert passed user dtype of float64 for object dtyped data in column 1
0reactions
mroeschkecommented, May 8, 2021

This looks to work on master now. Could use a test

In [5]: >>> import pandas as pd
   ...: >>>
   ...: >>> pd.to_numeric('1.7976931348623157e+308')
Out[5]: 1.7976931348623157e+308

In [6]: >>> from io import StringIO
   ...: >>>
   ...: >>> import numpy as np
   ...: >>> import pandas as pd
   ...: >>>
   ...: >>> csv_str = StringIO(('1.7976931348623157e+308, 1.7976931348623157e+308 '))
   ...: >>> df = pd.read_csv(csv_str, engine='c', names=["a", "b"], dtype={
   ...: ...                  "a": np.str, "b": np.float64})
<ipython-input-6-0fb2b8c2cf42>:8: DeprecationWarning: `np.str` is a deprecated alias for the builtin `str`. To silence this warning, use `str` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.str_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  "a": np.str, "b": np.float64})

In [7]: df
Out[7]:
                         a              b
0  1.7976931348623157e+308  1.797693e+308
Read more comments on GitHub >

github_iconTop Results From Across the Web

Python Pandas read_csv dtype fails to covert "string" to "float64"
According to the docs, the na_values parameter is a list-like structure of strings that can be recognised as NaN. Share.
Read more >
How to deal with errors of defining data types in pandas ...
I have to change the data type for each column separately using pd.to_numeric. My question is: Is there a way of setting data...
Read more >
Pandas read_csv() tricks you should know to speed up your ...
read_csv () has an argument called chunksize that allows you to retrieve the data in a same-sized chunk.
Read more >
pandas.to_numeric — pandas 1.5.2 documentation
The default return dtype is float64 or int64 depending on the data supplied. Use the downcast parameter to obtain other dtypes. Please note...
Read more >
Data Engineering for Other People
Or: Why is this software engineer being so difficult? ... In my case, I've been keeping track of how successful my taco truck...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found