Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

to_csv does not always handle line_terminator correctly

See original GitHub issue

Code Sample, a copy-pastable example if possible

def hex_print(content):
    print ' '.join(['{0:02x}'.format(ord(i)) for i in content])
    print ' '.join(['{:>2}'.format(repr(i).replace("'", '')) for i in content])
    print ' '

import pandas as pd
import tempfile

filename = tempfile.NamedTemporaryFile(delete = False).name

df = pd.DataFrame({'x':[1]})
for sep in ['\n', '\r', '\r\n', 'F']:
    print 'with separator: {} ~~~~~~~~~~~~~~~~~~~~~~~~'.format(repr(sep))
    df.to_csv(filename, line_terminator = sep)        
    with open(filename, 'rb') as f:
        content = f.read()
    print 'file method:'
    hex_print(content)
    print 'string method:'
    hex_print(df.to_csv(line_terminator = sep))

Problem description

It seems that the to_csv does not always handle the line_terminator argument correctly. The above code prints out the hexified CSV data produced from several different calls to to_csv. In particular, passing \n in fact produces \r\n, and \r\n becomes \r\r\n. Note also that this only happens when writing to a file, not directly returning the CSV data as a string.

However, this seems to be OS-dependent as well – I have reproduced it on several machines running Windows 7, Python 2.7, and various versions of pandas (including 0.20.1), but on a linux VM, it works as expected.

Output of above code:

with separator: '\n' ~~~~~~~~~~~~~~~~~~~~~~~~
file method:
2c 78 0d 0a 30 2c 31 0d 0a
 ,  x \r \n  0  ,  1 \r \n

string method:
2c 78 0a 30 2c 31 0a
 ,  x \n  0  ,  1 \n

with separator: '\r' ~~~~~~~~~~~~~~~~~~~~~~~~
file method:
2c 78 0d 30 2c 31 0d
 ,  x \r  0  ,  1 \r

string method:
2c 78 0d 30 2c 31 0d
 ,  x \r  0  ,  1 \r

with separator: '\r\n' ~~~~~~~~~~~~~~~~~~~~~~~~
file method:
2c 78 0d 0d 0a 30 2c 31 0d 0d 0a
 ,  x \r \r \n  0  ,  1 \r \r \n

string method:
2c 78 0d 0a 30 2c 31 0d 0a
 ,  x \r \n  0  ,  1 \r \n

with separator: 'F' ~~~~~~~~~~~~~~~~~~~~~~~~
file method:
2c 78 46 30 2c 31 46
 ,  x  F  0  ,  1  F

string method:
2c 78 46 30 2c 31 46
 ,  x  F  0  ,  1  F

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None python: 2.7.13.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.20.1 pytest: None pip: 9.0.1 setuptools: 35.0.1 Cython: None numpy: 1.13.1 scipy: None xarray: None IPython: 5.3.0 sphinx: 1.5.5 patsy: None dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 2.0.0 openpyxl: None xlrd: 1.0.0 xlwt: None xlsxwriter: 0.9.6 lxml: None bs4: 4.5.3 html5lib: 0.999999999 sqlalchemy: 1.1.7 pymysql: None psycopg2: 2.7.1 (dt dec pq3 ext lo64) jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: None

Issue Analytics

State:
Created 6 years ago
Comments:12 (6 by maintainers)

Top GitHub Comments

1reaction

redx177commented, Apr 19, 2018

I am experiencing the same issue.

With line_terminator='\r\n', mode='w', I am getting the line endings \r\r\n

When I am using line_terminator='\r\n', mode='wb', I am getting the error:

File “C:[…]\site-packages\pandas\io\common.py”, line 332, in _get_handle f = open(path, mode, errors=‘replace’) ValueError: binary mode doesn’t take an errors argument

And the same as @ingmars when I try to set an encoding with line_terminator='\r\n', mode='wb', encoding='utf-8'.

I settled at the end with not specifying any line_terminator. Not happy, but I will fix this with other tools after the file has been written.

Everything with Win10, Python 3.5, Pandas 0.19.2

1reaction

ingmarscommented, Feb 15, 2018

I everyone, I’m experiencing the same problem on Pandas 0.20.3 on Windows 7. However, mode=‘wb’ might be a dangerous fix, as then it crashes with an encoding setting such as encoding=‘utf-8’ saying: “ValueError: binary mode doesn’t take an encoding argument”.

It would be nice if there was a workaround making both line_terminator and encoding work at the same time.