question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: read_csv skipfooter fails with invalid quoted line

See original GitHub issue

Code Sample, a copy-pastable example if possible

from pandas.compat import StringIO

pd.read_csv(StringIO('''Date,Value
1/1/2012,100.00
1/2/2012,102.00
"a quoted junk row"morejunk'''),  skipfooter=1)

Out[21]
ERROR:root:An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line string', (1, 20))

---------------------------------------------------------------------------
Error                                     Traceback (most recent call last)
<ipython-input-34-d8dff6b9f4a7> in <module>()
      2 1/1/2012,100.00
      3 1/2/2012,102.00
----> 4 "a quoted junk row" '''),  skipfooter=1)

C:\Users\chris.bartak\Documents\python-dev\pandas\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
    651                     skip_blank_lines=skip_blank_lines)
    652 
--> 653         return _read(filepath_or_buffer, kwds)
    654 
    655     parser_f.__name__ = name

C:\Users\chris.bartak\Documents\python-dev\pandas\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
    404 
    405     try:
--> 406         data = parser.read()
    407     finally:
    408         parser.close()

C:\Users\chris.bartak\Documents\python-dev\pandas\pandas\io\parsers.py in read(self, nrows)
    977                 raise ValueError('skipfooter not supported for iteration')
    978 
--> 979         ret = self._engine.read(nrows)
    980 
    981         if self.options.get('as_recarray'):

C:\Users\chris.bartak\Documents\python-dev\pandas\pandas\io\parsers.py in read(self, rows)
   2066     def read(self, rows=None):
   2067         try:
-> 2068             content = self._get_lines(rows)
   2069         except StopIteration:
   2070             if self._first_chunk:

C:\Users\chris.bartak\Documents\python-dev\pandas\pandas\io\parsers.py in _get_lines(self, rows)
   2717                         while True:
   2718                             try:
-> 2719                                 new_rows.append(next(source))
   2720                                 rows += 1
   2721                             except csv.Error as inst:

Error: ',' expected after '"'

Problem description

This error only happens if the last row has quoting, and is invalid - e.g. delete the morejunk above and it does not error.

Expected Output

successful parse

pandas 0.19.2

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:13 (12 by maintainers)

github_iconTop GitHub Comments

2reactions
gfyoungcommented, Apr 5, 2017

Here’s a simpler example that we can use:

>>> data = 'a\n1\n"a"b'
>>> read_csv(StringIO(data), engine='c')
    a
0   1
1  ab
>>>
>>> read_csv(StringIO(data), engine='python')
...
_csv.Error: ',' expected after '"'
>>>
>>> read_csv(StringIO(data), engine='python', skipfooter=1)
...
_csv.Error: ',' expected after '"'
0reactions
shivampatel16commented, Sep 3, 2019

Here’s a simpler example that we can use:

>>> data = 'a\n1\n"a"b'
>>> read_csv(StringIO(data), engine='c')
    a
0   1
1  ab
>>>
>>> read_csv(StringIO(data), engine='python')
...
_csv.Error: ',' expected after '"'
>>>
>>> read_csv(StringIO(data), engine='python', skipfooter=1)
...
_csv.Error: ',' expected after '"'

**_

engine=‘c’ does the job for me. Finally got my task working after a huge but simple hurdle.

Thank you!

_**

Read more comments on GitHub >

github_iconTop Results From Across the Web

read_csv fails on bad lines despite error_bad_lines=False
You can read the first line of your file to get column names and skip the last column Element : import pandas as...
Read more >
Error - unable to read the csv file in pandas
after importing panda i am unable to read the csv file. import pandas as pd df=pd.read_csv(“data.csv”). the error i am getting is :- ......
Read more >
Error-free import of CSV files using Pandas DataFrame
The common errors occur, mainly, due to : · Wrong file delimiters mentioned. · File path not formed properly. · Wrong syntax or...
Read more >
pandas.read_csv — pandas 1.5.2 documentation
'error', raise an Exception when a bad line is encountered. 'warn', raise a warning when a bad line is encountered and skip that...
Read more >
Getting data into R - King lab at the University of Michigan
If R responds to your read.table() or read.csv() command with an error like ... sep = sep, quote = quote, dec = dec,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found