read_csv character encoding bug?
See original GitHub issueThis is a weird one from StackOverflow, this file has some \x00
s which seem to be ignored when printing but confuse read_csv
:
x = 'x,y\n \x00\x00\x00,Reg\n \x00\x00\x00,Reg\nI,Swp\nI,Swp\n'
X = StringIO(x)
In [3]: pd.read_csv(X)
Out[3]:
x y
0
1 NaN NaN
2 I Swp
3 I Swp
In [4]: print x
x,y
,Reg
,Reg
I,Swp
I,Swp
Issue Analytics
- State:
- Created 11 years ago
- Comments:10 (9 by maintainers)
Top Results From Across the Web
Loading .csv file with UTF-8 encoding error "no lines available ...
Excel likely uses a different encoding. Try to find which one your Excel is using. Other alternative: Go to RStudio -> File ->...
Read more >Issues with CSV uploads and character encoding in Shiny
Are there any generalizable solutions that allow me to 1) detect the character encoding of a CSV and 2) set my CSV to...
Read more >"Special" characters encoding issues with write_* and read_* ...
The problem with format_csv seems to be that the output is "UTF-8" encoded, but that R does not know about it. I.e. it...
Read more >Considerations for Data Loader, special characters, file ...
This behavior is the result of a combination of your import file's encoding and the Data Loader settings you have selected and is...
Read more >Solved: Problem importing csv file with UTF-8 encoding
I am trying to import a csv file. In this file the first line, the one containing the variables name, contains some names...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
if u want to put up tests for the c engine and s nice error message Python engine then can close
Seems like if you can address the regex delimiter problem (easier said than done) then it may be possible to deprecate the Python engine. This would be easier in the possible pandas 2.0 future in which we might add libre2 to the build / development toolchain