Inconsistent behaviour on sep/delimiter for pandas.read_csv
See original GitHub issueRelated: #7662
Code Sample, a copy-pastable example if possible
from io import StringIO
import pandas as pd
CSV = "a|b\n1|2"
print(pd.read_csv(StringIO(CSV), sep=None, engine='python'))
print('is not the same as')
print(pd.read_csv(StringIO(CSV), delimiter=None, engine='python'))
print('\nand no warning is emitted by')
print(pd.read_csv(StringIO(CSV), sep='|', delimiter=' '))
Problem description
According to the documentation,
delimiter : str, default None
Alternative argument name for sep.
Thus, I would expect sep
and delimiter
to be interchangable.
Expected Output
Specifying delimiter=None
is equivalent to specifying sep=None
, and specifying both sep
and delimiter
emits a warning or causes an error. Alternatively, either sep
or delimiter
should be deprecated.
Output of pd.show_versions()
No module named 'dask'
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-45-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL:
LANG: en_DK.UTF-8
LOCALE: en_DK.UTF-8
pandas: 0.24.0.dev0+332.g1f6ddc4
pytest: 3.6.3
pip: 10.0.1
setuptools: 39.0.1
Cython: 0.28.4
numpy: 1.14.5
scipy: 1.1.0
pyarrow: 0.9.0
xarray: 0.10.7
IPython: 6.4.0
sphinx: 1.7.5
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: 1.5.1
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.5
feather: 0.4.0
matplotlib: 2.2.2
openpyxl: 2.5.4
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.5
lxml: 4.2.3
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.9
pymysql: 0.9.2
psycopg2: None
jinja2: 2.10
s3fs: 0.1.5
fastparquet: 0.1.5
pandas_gbq: None
pandas_datareader: None
gcsfs: 0.1.1
Issue Analytics
- State:
- Created 5 years ago
- Comments:7 (7 by maintainers)
Top Results From Across the Web
Pandas read_csv - thousands separator working inconsistent
I am trying to read a csv file with commas as thousands separators into a pandas dataframe. The one time I am running...
Read more >Understanding Delimiters in Pandas read_csv() Function
One of the optional parameters in read_csv() is sep, a shortened name for separator. This operator is the delimiter we talked about before....
Read more >IO tools (text, CSV, HDF5, …) — pandas 1.5.2 documentation
Default behavior is to infer the column names: if no names are passed the behavior is identical to header=0 and column names are...
Read more >Dealing with extra white spaces while reading CSV in Pandas
Set up the benchmark using Pandas's read_csv() method; Explore the skipinitialspace parameter; Try the regex separator; Abandon the regex ...
Read more >CSV Loading - DuckDB
CSV files exist with different delimiters, they can contain quoted values, have an optional header row (or even multiple!) or even be completely...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
There’s some overlap with the dialect but obviously we don’t align on all the keywords and naming conventions thereof. I’d still stick with
sep
given it’s usage in pandas and there fact that there’s nodelimiter
into_csv
The other issue I mean is: https://github.com/pandas-dev/pandas/issues/22639