Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Inconsistent behaviour on sep/delimiter for pandas.read_csv

See original GitHub issue

Related: #7662

Code Sample, a copy-pastable example if possible

from io import StringIO

import pandas as pd


CSV = "a|b\n1|2"
print(pd.read_csv(StringIO(CSV), sep=None, engine='python'))
print('is not the same as')
print(pd.read_csv(StringIO(CSV), delimiter=None, engine='python'))
print('\nand no warning is emitted by')
print(pd.read_csv(StringIO(CSV), sep='|', delimiter=' '))

Problem description

According to the documentation,

delimiter : str, default None
    Alternative argument name for sep.

Thus, I would expect sep and delimiter to be interchangable.

Expected Output

Specifying delimiter=None is equivalent to specifying sep=None, and specifying both sep and delimiter emits a warning or causes an error. Alternatively, either sep or delimiter should be deprecated.

Output of `pd.show_versions()`

No module named 'dask'

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-45-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL:
LANG: en_DK.UTF-8
LOCALE: en_DK.UTF-8

pandas: 0.24.0.dev0+332.g1f6ddc4
pytest: 3.6.3
pip: 10.0.1
setuptools: 39.0.1
Cython: 0.28.4
numpy: 1.14.5
scipy: 1.1.0
pyarrow: 0.9.0
xarray: 0.10.7
IPython: 6.4.0
sphinx: 1.7.5
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: 1.5.1
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.5
feather: 0.4.0
matplotlib: 2.2.2
openpyxl: 2.5.4
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.5
lxml: 4.2.3
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.9
pymysql: 0.9.2
psycopg2: None
jinja2: 2.10
s3fs: 0.1.5
fastparquet: 0.1.5
pandas_gbq: None
pandas_datareader: None
gcsfs: 0.1.1

Issue Analytics

State:
Created 5 years ago
Comments:7 (7 by maintainers)

Top GitHub Comments

1reaction

WillAydcommented, Jul 21, 2018

There’s some overlap with the dialect but obviously we don’t align on all the keywords and naming conventions thereof. I’d still stick with sep given it’s usage in pandas and there fact that there’s no delimiter in to_csv

0reactions

jorisvandenbosschecommented, Oct 15, 2018

The other issue I mean is: https://github.com/pandas-dev/pandas/issues/22639