BUG: read_csv not guessing delimiter
See original GitHub issuePandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
file = "pandas_test.dat"
# Fails
df = pd.read_csv(file, delimiter=None)
# Works
from astropy.io import ascii
df = ascii.read(file)
Issue Description
pandas
fails when trying to guess the delimiter (space) in the test file. astropy
on the other hand is able to guess it correctly
The test file is here pandas_test.dat.zip
Expected Behavior
The delimiter should be correctly guessed
Installed Versions
pd.__version__
'1.4.2'
The line pd.show_versions()
throws the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/gabriel/miniconda3/envs/asteca/lib/python3.9/site-packages/pandas/util/_print_versions.py", line 109, in show_versions
deps = _get_dependency_info()
File "/home/gabriel/miniconda3/envs/asteca/lib/python3.9/site-packages/pandas/util/_print_versions.py", line 88, in _get_dependency_info
mod = import_optional_dependency(modname, errors="ignore")
File "/home/gabriel/miniconda3/envs/asteca/lib/python3.9/site-packages/pandas/compat/_optional.py", line 138, in import_optional_dependency
module = importlib.import_module(name)
File "/home/gabriel/miniconda3/envs/asteca/lib/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 850, in exec_module
File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
File "/home/gabriel/miniconda3/envs/asteca/lib/python3.9/site-packages/setuptools/__init__.py", line 8, in <module>
import _distutils_hack.override # noqa: F401
File "/home/gabriel/miniconda3/envs/asteca/lib/python3.9/site-packages/_distutils_hack/override.py", line 1, in <module>
__import__('_distutils_hack').do_override()
File "/home/gabriel/miniconda3/envs/asteca/lib/python3.9/site-packages/_distutils_hack/__init__.py", line 72, in do_override
ensure_local_distutils()
File "/home/gabriel/miniconda3/envs/asteca/lib/python3.9/site-packages/_distutils_hack/__init__.py", line 59, in ensure_local_distutils
assert '_distutils' in core.__file__, core.__file__
AssertionError: /home/gabriel/miniconda3/envs/asteca/lib/python3.9/distutils/core.py
Issue Analytics
- State:
- Created a year ago
- Comments:9 (7 by maintainers)
Top Results From Across the Web
CSV not parsed into columns despite Get Data delimiter set
My Excel does not parse CSV file correctly. The separator is comma (,). Even if I use the Get Data tool and set...
Read more >Can I import a CSV file and automatically infer the delimiter?
The csv module seems to recommend using the csv sniffer for this problem. ... However, on a |-delimited file, the "Could not determine...
Read more >787519 – Unhandled exception importing CSV, "Could not ...
Sniffer().sniff(csv_data.read(1024)) File "/usr/lib64/python2.6/csv.py", line 180, in sniff raise Error, "Could not determine delimiter" Error: Could not ...
Read more >Issue 2078: CSV Sniffer does not function properly on single ...
Sniffer.sniff() returns an unusable dialect: >>> import csv ... can't figure out the delimiter it's also not going to guess the quotechar.
Read more >2.3 More challenging csv and delimited files | Data Wrangling ...
Not all data files are as easy to import as the Chile.csv data. ... -3,*,8,Sun Prairie 9,13,,Verona * indicates an error occured when...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks @Gabriel-p for the report.
it appears that
pd.read_csv(io.StringIO(src), sep=None)
raisesError: Could not determine delimiter
whereaspd.read_csv(io.StringIO(src), delimiter=None)
produces a DataFrame with one column.According to the docs,
delimiter
is an alias forsep
. The default fordelimiter
is documented asNone
(which is strange), whereas the default forsep
is','
.I suspect that explicitly specifying
delimiter=None
is being ignored and treated as not passed and the default forsep
is being used (not checked this though)In any case, if the seperator/delimitor cannot be determined with
sep=None
, the same should be true fordelimiter=None
, so the expected behavior is to raiseError: Could not determine delimiter
contributions and PRs welcome.
Right now, tests are failing and the behavior is not as I described above, so I guess I have to wait until it is fixed in the main branch. My question is that it is not very clear to me which test file is responsible for this change, so I do not know where to add the test cases for this bugfix.