to_csv and bytes on Python 3.
See original GitHub issueIs this desired behavior and something I need to work around or a bug? Notice the byte type marker is written to disk so you can’t round-trip the data. This works fine in Python 2 with unicode AFAICT.
In [1]: pd.__version__
Out[1]: '0.15.2-252-g0d35dd4'
In [2]: pd.DataFrame.from_dict({'a': ['a', 'b', 'c']}).a.str.encode("utf-8").to_csv("tmp.csv")
In [3]: !cat tmp.csv
0,b'a'
1,b'b'
2,b'c'
Issue Analytics
- State:
- Created 8 years ago
- Reactions:5
- Comments:12 (8 by maintainers)
Top Results From Across the Web
Write bytes literal with undefined character to CSV file (Python 3)
It was just copy and paste with Notepad++, and according to a hex editor the value was inserted correctly. But how can I...
Read more >Python Convert Bytes to CSV - Finxter
If you get a binary input of tabular structured data, you can convert it to CSV easily by using str(byte)[2:-1] slicing to get...
Read more >Writing bytes using CSV module results in b prefixed strings
Currently providing bytes to write to a CSV passes it through str() ... which is why bytes and string were separated in Python3...
Read more >Python - Write Bytes to File - GeeksforGeeks
Next, use the write function to write the byte contents to a binary file. Python3. Python3 ...
Read more >Python - Convert bytes / unicode tab delimited data to csv file
[Example code]-Python - Convert bytes / unicode tab delimited data to csv file ... \tItemFees\tRefundCommission\t-3,96\tAFN\t04.05.2018\t04.05.2018 18:24:39 ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@zhuoqiang What I think you meant is you have to do this:
Simply doing
astype(str)
doesn’t help–theto_csv()
output still containsb'...'
wrappers.I totally agree with @jzwinck. How can you in any way justify leaking python’s encoding system syntax into a generic data exchange format?
When you use
pd.read_csv()
and an Array-protocol type stringsdtype
round tripping gets messed up:Using
dtype=str
ordtype='S'
does works as expected however?I actually even find ^ unexpected since it seems to be interpreting as python string literals automatically?
If a user chooses to load CSV data as
bytes
it should be specified explicitly just like it works when you write out unicode and not inferred from python’s encoding specific markup: