question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

SUB-character in a csv causes read_csv() with C-Engine to detect EOF

See original GitHub issue

Problem description

If there is a SUB-character in a string in a csv, read_csv() with the standard C-engine returns

ParserError: Error tokenizing data. C error: EOF inside string starting at line 0

The Python-engine can read the file fine.

It seems I can’t put example data with a SUB-character here, so I pasted an example line here instead: https://pastebin.com/x6QPY4Hf Just paste the line into a csv and try to read it with read_csv().

I don’t know if this behaviour is expected or not since this character is indeed used as EOF in certain cases, however I see little sense in having a SUB character interpreted as EOF in the middle of a csv file.

commit: None

python: 3.6.1.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64

pandas: 0.20.2

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:9 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
Khris777commented, Jul 17, 2017

Thanks for adding the test. Since I’m only here on weekdays you were faster than me. 😃

1reaction
Khris777commented, Jul 14, 2017

Can confirm that updating solves the problem.

Read more comments on GitHub >

github_iconTop Results From Across the Web

read_csv() & EOF character in string cause parsing issue
The problem I found is that there is a single ; in each csv file towards the end of the file.
Read more >
Error tokenizing data. C error: EOF inside string starting at line
The solution was to use the parameter engine='python' in the read_csv function call. The Pandas CSV parser can use two different “engines” to...
Read more >
Read Rectangular Text Data • readr
The goal of readr is to provide a fast and friendly way to read rectangular data from delimited files, such as comma-separated values...
Read more >
EOF marker x1A throwing csv input - CloverCARE Support
I've downloaded a pkzip'd file from the mainframe and unzipped it. ... The end of file marker is causing the delimited csv parser...
Read more >
What's New — pandas 0.20.1 documentation
The 'python' engine for read_csv() , as well as the read_fwf() function for ... Bug in .to_json() causing single byte ascii characters to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found