question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

to_csv and bytes on Python 3.

See original GitHub issue

Is this desired behavior and something I need to work around or a bug? Notice the byte type marker is written to disk so you can’t round-trip the data. This works fine in Python 2 with unicode AFAICT.

In [1]: pd.__version__
Out[1]: '0.15.2-252-g0d35dd4'

In [2]: pd.DataFrame.from_dict({'a': ['a', 'b', 'c']}).a.str.encode("utf-8").to_csv("tmp.csv")

In [3]: !cat tmp.csv
0,b'a'
1,b'b'
2,b'c'

Issue Analytics

  • State:open
  • Created 8 years ago
  • Reactions:5
  • Comments:12 (8 by maintainers)

github_iconTop GitHub Comments

3reactions
jzwinckcommented, Jul 6, 2016

@zhuoqiang What I think you meant is you have to do this:

df['Column'] = df['Column'].str.decode('ascii') # or utf-8 etc.

Simply doing astype(str) doesn’t help–the to_csv() output still contains b'...' wrappers.

2reactions
goodboycommented, Jul 26, 2016

I totally agree with @jzwinck. How can you in any way justify leaking python’s encoding system syntax into a generic data exchange format?

When you use pd.read_csv() and an Array-protocol type strings dtype round tripping gets messed up:

import pandas as pd
fname = './blah.csv'
pd.Series([b'x',b'y']).to_csv(fname)
>>> pd.read_csv(fname, dtype='S5')
     0     b'x'
0  b'1'  b"b'y'"

Using dtype=str or dtype='S' does works as expected however?

>>> pd.read_csv(fname, dtype='S')
   0  b'x'
0  1  b'y'

I actually even find ^ unexpected since it seems to be interpreting as python string literals automatically?

If a user chooses to load CSV data as bytes it should be specified explicitly just like it works when you write out unicode and not inferred from python’s encoding specific markup:

>>> pd.Series(['x', 'y']).to_csv(fname)
>>> pd.read_csv(fname)
   0  x
0  1  y
>>> >>> pd.read_csv(fname, dtype='S10')
   0  b'x'
0  1  b'y'
Read more comments on GitHub >

github_iconTop Results From Across the Web

Write bytes literal with undefined character to CSV file (Python 3)
It was just copy and paste with Notepad++, and according to a hex editor the value was inserted correctly. But how can I...
Read more >
Python Convert Bytes to CSV - Finxter
If you get a binary input of tabular structured data, you can convert it to CSV easily by using str(byte)[2:-1] slicing to get...
Read more >
Writing bytes using CSV module results in b prefixed strings
Currently providing bytes to write to a CSV passes it through str() ... which is why bytes and string were separated in Python3...
Read more >
Python - Write Bytes to File - GeeksforGeeks
Next, use the write function to write the byte contents to a binary file. Python3. Python3 ...
Read more >
Python - Convert bytes / unicode tab delimited data to csv file
[Example code]-Python - Convert bytes / unicode tab delimited data to csv file ... \tItemFees\tRefundCommission\t-3,96\tAFN\t04.05.2018\t04.05.2018 18:24:39 ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found