question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

to_csv does not always handle line_terminator correctly

See original GitHub issue

Code Sample, a copy-pastable example if possible

def hex_print(content):
    print ' '.join(['{0:02x}'.format(ord(i)) for i in content])
    print ' '.join(['{:>2}'.format(repr(i).replace("'", '')) for i in content])
    print ' '

import pandas as pd
import tempfile

filename = tempfile.NamedTemporaryFile(delete = False).name

df = pd.DataFrame({'x':[1]})
for sep in ['\n', '\r', '\r\n', 'F']:
    print 'with separator: {} ~~~~~~~~~~~~~~~~~~~~~~~~'.format(repr(sep))
    df.to_csv(filename, line_terminator = sep)        
    with open(filename, 'rb') as f:
        content = f.read()
    print 'file method:'
    hex_print(content)
    print 'string method:'
    hex_print(df.to_csv(line_terminator = sep))

Problem description

It seems that the to_csv does not always handle the line_terminator argument correctly. The above code prints out the hexified CSV data produced from several different calls to to_csv. In particular, passing \n in fact produces \r\n, and \r\n becomes \r\r\n. Note also that this only happens when writing to a file, not directly returning the CSV data as a string.

However, this seems to be OS-dependent as well – I have reproduced it on several machines running Windows 7, Python 2.7, and various versions of pandas (including 0.20.1), but on a linux VM, it works as expected.

Output of above code:

with separator: '\n' ~~~~~~~~~~~~~~~~~~~~~~~~
file method:
2c 78 0d 0a 30 2c 31 0d 0a
 ,  x \r \n  0  ,  1 \r \n

string method:
2c 78 0a 30 2c 31 0a
 ,  x \n  0  ,  1 \n

with separator: '\r' ~~~~~~~~~~~~~~~~~~~~~~~~
file method:
2c 78 0d 30 2c 31 0d
 ,  x \r  0  ,  1 \r

string method:
2c 78 0d 30 2c 31 0d
 ,  x \r  0  ,  1 \r

with separator: '\r\n' ~~~~~~~~~~~~~~~~~~~~~~~~
file method:
2c 78 0d 0d 0a 30 2c 31 0d 0d 0a
 ,  x \r \r \n  0  ,  1 \r \r \n

string method:
2c 78 0d 0a 30 2c 31 0d 0a
 ,  x \r \n  0  ,  1 \r \n

with separator: 'F' ~~~~~~~~~~~~~~~~~~~~~~~~
file method:
2c 78 46 30 2c 31 46
 ,  x  F  0  ,  1  F

string method:
2c 78 46 30 2c 31 46
 ,  x  F  0  ,  1  F

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None python: 2.7.13.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.20.1 pytest: None pip: 9.0.1 setuptools: 35.0.1 Cython: None numpy: 1.13.1 scipy: None xarray: None IPython: 5.3.0 sphinx: 1.5.5 patsy: None dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 2.0.0 openpyxl: None xlrd: 1.0.0 xlwt: None xlsxwriter: 0.9.6 lxml: None bs4: 4.5.3 html5lib: 0.999999999 sqlalchemy: 1.1.7 pymysql: None psycopg2: 2.7.1 (dt dec pq3 ext lo64) jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:12 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
redx177commented, Apr 19, 2018

I am experiencing the same issue.

With line_terminator='\r\n', mode='w', I am getting the line endings \r\r\n

When I am using line_terminator='\r\n', mode='wb', I am getting the error:

File “C:[…]\site-packages\pandas\io\common.py”, line 332, in _get_handle f = open(path, mode, errors=‘replace’) ValueError: binary mode doesn’t take an errors argument

And the same as @ingmars when I try to set an encoding with line_terminator='\r\n', mode='wb', encoding='utf-8'.

I settled at the end with not specifying any line_terminator. Not happy, but I will fix this with other tools after the file has been written.

Everything with Win10, Python 3.5, Pandas 0.19.2

1reaction
ingmarscommented, Feb 15, 2018

I everyone, I’m experiencing the same problem on Pandas 0.20.3 on Windows 7. However, mode=‘wb’ might be a dangerous fix, as then it crashes with an encoding setting such as encoding=‘utf-8’ saying: “ValueError: binary mode doesn’t take an encoding argument”.

It would be nice if there was a workaround making both line_terminator and encoding work at the same time.

Read more comments on GitHub >

github_iconTop Results From Across the Web

DataFrame.to_csv not using correct line terminator value
Problem description. I noticed a strange behavior when using pandas.DataFrame.to_csv method on Windows (pandas version 0.20.3).
Read more >
problem with line terminator \n on dataframe and .csv
i tried to use the line_terminator. in my mind, if i force it to get only \r\n and not \n, it would work....
Read more >
pandas.read_csv — pandas 1.5.2 documentation
Default behavior is to infer the column names: if no names are passed the behavior is identical to header=0 and column names are...
Read more >
Why “df.to_csv” could be a Mistake ? | by Elfao | Analytics Vidhya
As a Data Scientist and in the field of data analysis more globally, load and save data (DataFrame) is almost systematic. Usually, I...
Read more >
Reading and Writing CSV Files in Python - Real Python
Learn how to read, process, and parse CSV from text files using Python. You'll see how CSV files work, learn the all-important "csv"...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found