Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

to_csv and bytes on Python 3.

See original GitHub issue

Is this desired behavior and something I need to work around or a bug? Notice the byte type marker is written to disk so you can’t round-trip the data. This works fine in Python 2 with unicode AFAICT.

In [1]: pd.__version__
Out[1]: '0.15.2-252-g0d35dd4'

In [2]: pd.DataFrame.from_dict({'a': ['a', 'b', 'c']}).a.str.encode("utf-8").to_csv("tmp.csv")

In [3]: !cat tmp.csv
0,b'a'
1,b'b'
2,b'c'

Issue Analytics

State:
Created 8 years ago
Reactions:5
Comments:12 (8 by maintainers)

Top GitHub Comments

3reactions

jzwinckcommented, Jul 6, 2016

@zhuoqiang What I think you meant is you have to do this:

df['Column'] = df['Column'].str.decode('ascii') # or utf-8 etc.

Simply doing astype(str) doesn’t help–the to_csv() output still contains b'...' wrappers.

2reactions

goodboycommented, Jul 26, 2016

I totally agree with @jzwinck. How can you in any way justify leaking python’s encoding system syntax into a generic data exchange format?

When you use pd.read_csv() and an Array-protocol type strings dtype round tripping gets messed up:

import pandas as pd
fname = './blah.csv'
pd.Series([b'x',b'y']).to_csv(fname)

>>> pd.read_csv(fname, dtype='S5')
     0     b'x'
0  b'1'  b"b'y'"

Using dtype=str or dtype='S' does works as expected however?

>>> pd.read_csv(fname, dtype='S')
   0  b'x'
0  1  b'y'

I actually even find ^ unexpected since it seems to be interpreting as python string literals automatically?

If a user chooses to load CSV data as bytes it should be specified explicitly just like it works when you write out unicode and not inferred from python’s encoding specific markup:

>>> pd.Series(['x', 'y']).to_csv(fname)
>>> pd.read_csv(fname)
   0  x
0  1  y
>>> >>> pd.read_csv(fname, dtype='S10')
   0  b'x'
0  1  b'y'

Top Results From Across the Web

Write bytes literal with undefined character to CSV file (Python 3)

It was just copy and paste with Notepad++, and according to a hex editor the value was inserted correctly. But how can I...

Python Convert Bytes to CSV - Finxter

If you get a binary input of tabular structured data, you can convert it to CSV easily by using str(byte)[2:-1] slicing to get...

Writing bytes using CSV module results in b prefixed strings

Currently providing bytes to write to a CSV passes it through str() ... which is why bytes and string were separated in Python3...

Python - Write Bytes to File - GeeksforGeeks

Next, use the write function to write the byte contents to a binary file. Python3. Python3 ...

Python - Convert bytes / unicode tab delimited data to csv file

[Example code]-Python - Convert bytes / unicode tab delimited data to csv file ... \tItemFees\tRefundCommission\t-3,96\tAFN\t04.05.2018\t04.05.2018 18:24:39 ...