to_csv does not always handle line_terminator correctly
See original GitHub issueCode Sample, a copy-pastable example if possible
def hex_print(content):
print ' '.join(['{0:02x}'.format(ord(i)) for i in content])
print ' '.join(['{:>2}'.format(repr(i).replace("'", '')) for i in content])
print ' '
import pandas as pd
import tempfile
filename = tempfile.NamedTemporaryFile(delete = False).name
df = pd.DataFrame({'x':[1]})
for sep in ['\n', '\r', '\r\n', 'F']:
print 'with separator: {} ~~~~~~~~~~~~~~~~~~~~~~~~'.format(repr(sep))
df.to_csv(filename, line_terminator = sep)
with open(filename, 'rb') as f:
content = f.read()
print 'file method:'
hex_print(content)
print 'string method:'
hex_print(df.to_csv(line_terminator = sep))
Problem description
It seems that the to_csv
does not always handle the line_terminator
argument correctly. The above code prints out the hexified CSV data produced from several different calls to to_csv
. In particular, passing \n
in fact produces \r\n
, and \r\n
becomes \r\r\n
. Note also that this only happens when writing to a file, not directly returning the CSV data as a string.
However, this seems to be OS-dependent as well – I have reproduced it on several machines running Windows 7, Python 2.7, and various versions of pandas (including 0.20.1), but on a linux VM, it works as expected.
Output of above code:
with separator: '\n' ~~~~~~~~~~~~~~~~~~~~~~~~
file method:
2c 78 0d 0a 30 2c 31 0d 0a
, x \r \n 0 , 1 \r \n
string method:
2c 78 0a 30 2c 31 0a
, x \n 0 , 1 \n
with separator: '\r' ~~~~~~~~~~~~~~~~~~~~~~~~
file method:
2c 78 0d 30 2c 31 0d
, x \r 0 , 1 \r
string method:
2c 78 0d 30 2c 31 0d
, x \r 0 , 1 \r
with separator: '\r\n' ~~~~~~~~~~~~~~~~~~~~~~~~
file method:
2c 78 0d 0d 0a 30 2c 31 0d 0d 0a
, x \r \r \n 0 , 1 \r \r \n
string method:
2c 78 0d 0a 30 2c 31 0d 0a
, x \r \n 0 , 1 \r \n
with separator: 'F' ~~~~~~~~~~~~~~~~~~~~~~~~
file method:
2c 78 46 30 2c 31 46
, x F 0 , 1 F
string method:
2c 78 46 30 2c 31 46
, x F 0 , 1 F
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 2.7.13.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None
pandas: 0.20.1 pytest: None pip: 9.0.1 setuptools: 35.0.1 Cython: None numpy: 1.13.1 scipy: None xarray: None IPython: 5.3.0 sphinx: 1.5.5 patsy: None dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 2.0.0 openpyxl: None xlrd: 1.0.0 xlwt: None xlsxwriter: 0.9.6 lxml: None bs4: 4.5.3 html5lib: 0.999999999 sqlalchemy: 1.1.7 pymysql: None psycopg2: 2.7.1 (dt dec pq3 ext lo64) jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: None
Issue Analytics
- State:
- Created 6 years ago
- Comments:12 (6 by maintainers)
Top GitHub Comments
I am experiencing the same issue.
With
line_terminator='\r\n', mode='w'
, I am getting the line endings\r\r\n
When I am using
line_terminator='\r\n', mode='wb'
, I am getting the error:And the same as @ingmars when I try to set an encoding with
line_terminator='\r\n', mode='wb', encoding='utf-8'
.I settled at the end with not specifying any line_terminator. Not happy, but I will fix this with other tools after the file has been written.
Everything with Win10, Python 3.5, Pandas 0.19.2
I everyone, I’m experiencing the same problem on Pandas 0.20.3 on Windows 7. However, mode=‘wb’ might be a dangerous fix, as then it crashes with an encoding setting such as encoding=‘utf-8’ saying: “ValueError: binary mode doesn’t take an encoding argument”.
It would be nice if there was a workaround making both line_terminator and encoding work at the same time.