question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Inconsistent behaviour on sep/delimiter for pandas.read_csv

See original GitHub issue

Related: #7662

Code Sample, a copy-pastable example if possible

from io import StringIO

import pandas as pd


CSV = "a|b\n1|2"
print(pd.read_csv(StringIO(CSV), sep=None, engine='python'))
print('is not the same as')
print(pd.read_csv(StringIO(CSV), delimiter=None, engine='python'))
print('\nand no warning is emitted by')
print(pd.read_csv(StringIO(CSV), sep='|', delimiter=' '))

Problem description

According to the documentation,

delimiter : str, default None
    Alternative argument name for sep.

Thus, I would expect sep and delimiter to be interchangable.

Expected Output

Specifying delimiter=None is equivalent to specifying sep=None, and specifying both sep and delimiter emits a warning or causes an error. Alternatively, either sep or delimiter should be deprecated.

Output of pd.show_versions()

No module named 'dask'

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-45-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL:
LANG: en_DK.UTF-8
LOCALE: en_DK.UTF-8

pandas: 0.24.0.dev0+332.g1f6ddc4
pytest: 3.6.3
pip: 10.0.1
setuptools: 39.0.1
Cython: 0.28.4
numpy: 1.14.5
scipy: 1.1.0
pyarrow: 0.9.0
xarray: 0.10.7
IPython: 6.4.0
sphinx: 1.7.5
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: 1.5.1
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.5
feather: 0.4.0
matplotlib: 2.2.2
openpyxl: 2.5.4
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.5
lxml: 4.2.3
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.9
pymysql: 0.9.2
psycopg2: None
jinja2: 2.10
s3fs: 0.1.5
fastparquet: 0.1.5
pandas_gbq: None
pandas_datareader: None
gcsfs: 0.1.1

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
WillAydcommented, Jul 21, 2018

There’s some overlap with the dialect but obviously we don’t align on all the keywords and naming conventions thereof. I’d still stick with sep given it’s usage in pandas and there fact that there’s no delimiter in to_csv

0reactions
jorisvandenbosschecommented, Oct 15, 2018
Read more comments on GitHub >

github_iconTop Results From Across the Web

Pandas read_csv - thousands separator working inconsistent
I am trying to read a csv file with commas as thousands separators into a pandas dataframe. The one time I am running...
Read more >
Understanding Delimiters in Pandas read_csv() Function
One of the optional parameters in read_csv() is sep, a shortened name for separator. This operator is the delimiter we talked about before....
Read more >
IO tools (text, CSV, HDF5, …) — pandas 1.5.2 documentation
Default behavior is to infer the column names: if no names are passed the behavior is identical to header=0 and column names are...
Read more >
Dealing with extra white spaces while reading CSV in Pandas
Set up the benchmark using Pandas's read_csv() method; Explore the skipinitialspace parameter; Try the regex separator; Abandon the regex ...
Read more >
CSV Loading - DuckDB
CSV files exist with different delimiters, they can contain quoted values, have an optional header row (or even multiple!) or even be completely...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found