question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: read_csv not guessing delimiter

See original GitHub issue

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
file = "pandas_test.dat"

# Fails
df = pd.read_csv(file, delimiter=None)

# Works
from astropy.io import ascii
df = ascii.read(file)

Issue Description

pandas fails when trying to guess the delimiter (space) in the test file. astropy on the other hand is able to guess it correctly

The test file is here pandas_test.dat.zip

Expected Behavior

The delimiter should be correctly guessed

Installed Versions

pd.__version__
'1.4.2'

The line pd.show_versions() throws the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/gabriel/miniconda3/envs/asteca/lib/python3.9/site-packages/pandas/util/_print_versions.py", line 109, in show_versions
    deps = _get_dependency_info()
  File "/home/gabriel/miniconda3/envs/asteca/lib/python3.9/site-packages/pandas/util/_print_versions.py", line 88, in _get_dependency_info
    mod = import_optional_dependency(modname, errors="ignore")
  File "/home/gabriel/miniconda3/envs/asteca/lib/python3.9/site-packages/pandas/compat/_optional.py", line 138, in import_optional_dependency
    module = importlib.import_module(name)
  File "/home/gabriel/miniconda3/envs/asteca/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/home/gabriel/miniconda3/envs/asteca/lib/python3.9/site-packages/setuptools/__init__.py", line 8, in <module>
    import _distutils_hack.override  # noqa: F401
  File "/home/gabriel/miniconda3/envs/asteca/lib/python3.9/site-packages/_distutils_hack/override.py", line 1, in <module>
    __import__('_distutils_hack').do_override()
  File "/home/gabriel/miniconda3/envs/asteca/lib/python3.9/site-packages/_distutils_hack/__init__.py", line 72, in do_override
    ensure_local_distutils()
  File "/home/gabriel/miniconda3/envs/asteca/lib/python3.9/site-packages/_distutils_hack/__init__.py", line 59, in ensure_local_distutils
    assert '_distutils' in core.__file__, core.__file__
AssertionError: /home/gabriel/miniconda3/envs/asteca/lib/python3.9/distutils/core.py

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:9 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
simonjayhawkinscommented, May 16, 2022

Thanks @Gabriel-p for the report.

it appears that pd.read_csv(io.StringIO(src), sep=None) raises Error: Could not determine delimiter whereas pd.read_csv(io.StringIO(src), delimiter=None) produces a DataFrame with one column.

According to the docs, delimiter is an alias for sep. The default for delimiter is documented as None (which is strange), whereas the default for sep is ','.

I suspect that explicitly specifying delimiter=None is being ignored and treated as not passed and the default for sep is being used (not checked this though)

Expected Behavior

The delimiter should be correctly guessed

In any case, if the seperator/delimitor cannot be determined with sep=None, the same should be true for delimiter=None, so the expected behavior is to raise Error: Could not determine delimiter

contributions and PRs welcome.

0reactions
dimitra-karadimacommented, Oct 13, 2022

Right now, tests are failing and the behavior is not as I described above, so I guess I have to wait until it is fixed in the main branch. My question is that it is not very clear to me which test file is responsible for this change, so I do not know where to add the test cases for this bugfix.

Read more comments on GitHub >

github_iconTop Results From Across the Web

CSV not parsed into columns despite Get Data delimiter set
My Excel does not parse CSV file correctly. The separator is comma (,). Even if I use the Get Data tool and set...
Read more >
Can I import a CSV file and automatically infer the delimiter?
The csv module seems to recommend using the csv sniffer for this problem. ... However, on a |-delimited file, the "Could not determine...
Read more >
787519 – Unhandled exception importing CSV, "Could not ...
Sniffer().sniff(csv_data.read(1024)) File "/usr/lib64/python2.6/csv.py", line 180, in sniff raise Error, "Could not determine delimiter" Error: Could not ...
Read more >
Issue 2078: CSV Sniffer does not function properly on single ...
Sniffer.sniff() returns an unusable dialect: >>> import csv ... can't figure out the delimiter it's also not going to guess the quotechar.
Read more >
2.3 More challenging csv and delimited files | Data Wrangling ...
Not all data files are as easy to import as the Chile.csv data. ... -3,*,8,Sun Prairie 9,13,,Verona * indicates an error occured when...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found