I case of dtype issues, read_csv doesn't give an error as useful as pd.to_numeric does
See original GitHub issueA follow-up to #13237 . Copied examples: Here’s what to_numeric shows:
In [137]: pd.to_numeric(o)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
pandas/src/inference.pyx in pandas.lib.maybe_convert_numeric (pandas/lib.c:55708)()
ValueError: Unable to parse string "52721156299871854681072370812488856336599863860274272781560003496046130816295143841376557767523688876482371118681808060318819187029855011862637267061548684520491431327693943042785705302632892888308342787190965977140539558800921069356177102235514666302335984730634641934384020650987601849970936578094137344.00000"
And here’s what read_csv
shows (the data is at ftp://ftp.sanger.ac.uk/pub/consortia/ibdgenetics/iibdgc-trans-ancestry-summary-stats.tar):
In [138]: d = pd.read_csv('EUR.UC.gwas.assoc', delim_whitespace=True, usecols=['OR'], dtype={'OR': np.float64})
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
pandas/parser.pyx in pandas.parser.TextReader._convert_tokens (pandas/parser.c:14411)()
TypeError: Cannot cast array from dtype('O') to dtype('float64') according to the rule 'safe'
During handling of the above exception, another exception occurred:
[... long stacktrace ...]
pandas/parser.pyx in pandas.parser.TextReader._convert_tokens (pandas/parser.c:14632)()
ValueError: cannot safely convert passed user dtype of float64 for object dtyped data in column 8
@jreback I’ve finally started looking into it, and it seems that I can’t implement it in a good way without changing NumPy because, in the end, it’s NumPy who doesn’t give any row/value information, albeit Pandas conditionally changes the exception to its own.
I can write an ad-hoc implementation for numeric conversion using pd.to_numeric
though, and use its row/value information in case it raises an exception. What do you think?
Issue Analytics
- State:
- Created 6 years ago
- Comments:7 (7 by maintainers)
Top Results From Across the Web
Python Pandas read_csv dtype fails to covert "string" to "float64"
According to the docs, the na_values parameter is a list-like structure of strings that can be recognised as NaN. Share.
Read more >How to deal with errors of defining data types in pandas ...
I have to change the data type for each column separately using pd.to_numeric. My question is: Is there a way of setting data...
Read more >Pandas read_csv() tricks you should know to speed up your ...
read_csv () has an argument called chunksize that allows you to retrieve the data in a same-sized chunk.
Read more >pandas.to_numeric — pandas 1.5.2 documentation
The default return dtype is float64 or int64 depending on the data supplied. Use the downcast parameter to obtain other dtypes. Please note...
Read more >Data Engineering for Other People
Or: Why is this software engineer being so difficult? ... In my case, I've been keeping track of how successful my taco truck...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
simpler sample code example:
This looks to work on master now. Could use a test