read_csv C-engine CParserError: Error tokenizing data
See original GitHub issueHi,
I have encountered a dataset where the C-engine read_csv has problems. I am unsure of the exact issue but I have narrowed it down to a single row which I have pickled and uploaded it to dropbox. If you obtain the pickle try the following:
df = pd.read_pickle('faulty_row.pkl')
df.to_csv('faulty_row.csv', encoding='utf8', index=False)
df.read_csv('faulty_row.csv', encoding='utf8')
I get the following exception:
CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.
If you try and read the CSV using the python engine then no exception is thrown:
df.read_csv('faulty_row.csv', encoding='utf8', engine='python')
Suggesting that the issue is with read_csv and not to_csv. The versions I using are:
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Linux
OS-release: 3.19.0-28-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
pandas: 0.16.2
nose: 1.3.7
Cython: 0.22.1
numpy: 1.9.2
scipy: 0.15.1
IPython: 3.2.1
patsy: 0.3.0
tables: 3.2.0
numexpr: 2.4.3
matplotlib: 1.4.3
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 1.0.0
xlsxwriter: 0.7.3
lxml: 3.4.4
bs4: 4.3.2
Issue Analytics
- State:
- Created 8 years ago
- Reactions:21
- Comments:19 (3 by maintainers)
Top Results From Across the Web
Python Pandas Error tokenizing data - csv - Stack Overflow
If this error arises when reading a file written by pandas.to_csv() , it MIGHT be because there is a '\r' in a column...
Read more >How To Fix pandas.parser.CParserError: Error tokenizing data
The most obvious solution to the problem, is to fix the data file manually by removing the extra separators in the lines causing...
Read more >How to fix CParserError: Error tokenizing data
Fix it manually. The Error tokenizing data may arise when you're using separator (for eg. · pandas.to_csv() · skiprows. Sometimes the parser is...
Read more >IO tools (text, CSV, HDF5, …) — pandas 1.5.2 documentation
In [1]: import pandas as pd In [2]: from io import StringIO In [3]: data = "col1 ... _libs.parsers.raise_parser_error() ParserError: Error tokenizing data....
Read more >How To Solve Python Pandas Error Tokenizing Data Error?
While reading a CSV file, you may get the “Pandas Error Tokenizing Data“. This mostly occurs due to the incorrect data in the...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I missed @alfonsomhc answer because it just looked like a comment.
You need
Your second-to-last line includes an
'\r'
break. I think it’s a bug, but one workaround is to open in universal-new-line mode.