Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

read_csv C-engine CParserError: Error tokenizing data

See original GitHub issue

Hi,

I have encountered a dataset where the C-engine read_csv has problems. I am unsure of the exact issue but I have narrowed it down to a single row which I have pickled and uploaded it to dropbox. If you obtain the pickle try the following:

df = pd.read_pickle('faulty_row.pkl')
df.to_csv('faulty_row.csv', encoding='utf8', index=False)
df.read_csv('faulty_row.csv', encoding='utf8')

I get the following exception:

CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.

If you try and read the CSV using the python engine then no exception is thrown:

df.read_csv('faulty_row.csv', encoding='utf8', engine='python')

Suggesting that the issue is with read_csv and not to_csv. The versions I using are:

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Linux
OS-release: 3.19.0-28-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8

pandas: 0.16.2
nose: 1.3.7
Cython: 0.22.1
numpy: 1.9.2
scipy: 0.15.1
IPython: 3.2.1
patsy: 0.3.0
tables: 3.2.0
numexpr: 2.4.3
matplotlib: 1.4.3
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 1.0.0
xlsxwriter: 0.7.3
lxml: 3.4.4
bs4: 4.3.2

Issue Analytics

State:
Created 8 years ago
Reactions:21
Comments:19 (3 by maintainers)

Top GitHub Comments

76reactions

justinjdickowcommented, Jan 10, 2018

I missed @alfonsomhc answer because it just looked like a comment.

You need

df = pd.read_csv('test.csv', engine='python')

43reactions

chris-b1commented, Sep 23, 2015

Your second-to-last line includes an '\r' break. I think it’s a bug, but one workaround is to open in universal-new-line mode.

pd.read_csv(open('test.csv','rU'), encoding='utf-8', engine='c')

Top Results From Across the Web

Python Pandas Error tokenizing data - csv - Stack Overflow

If this error arises when reading a file written by pandas.to_csv() , it MIGHT be because there is a '\r' in a column...

How To Fix pandas.parser.CParserError: Error tokenizing data

The most obvious solution to the problem, is to fix the data file manually by removing the extra separators in the lines causing...

How to fix CParserError: Error tokenizing data

Fix it manually. The Error tokenizing data may arise when you're using separator (for eg. · pandas.to_csv() · skiprows. Sometimes the parser is...

IO tools (text, CSV, HDF5, …) — pandas 1.5.2 documentation

In [1]: import pandas as pd In [2]: from io import StringIO In [3]: data = "col1 ... _libs.parsers.raise_parser_error() ParserError: Error tokenizing data....

How To Solve Python Pandas Error Tokenizing Data Error?

While reading a CSV file, you may get the “Pandas Error Tokenizing Data“. This mostly occurs due to the incorrect data in the...