Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CParserError: Error tokenizing data. C error: Expected 2 fields in line 733, saw 3

See original GitHub issue

I am trying to extract the tables from a number of pdf documents:

In:

from tabula import read_pdf_table
pdf_table = read_pdf_table("../file.pdf", pages="all")

Out:


---------------------------------------------------------------------------
CParserError                              Traceback (most recent call last)
<ipython-input-31-c86da9ee0350> in <module>()
      1 from tabula import read_pdf_table
----> 2 pdf_table = read_pdf_table("../file.pdf", pages="all")
      3 type(pdf_table)

/usr/local/lib/python3.5/site-packages/tabula/wrapper.py in read_pdf_table(input_path, options, pages, guess, area, spreadsheet, password, nospreadsheet, silent)
    100         return
    101 
--> 102     return pd.read_csv(io.BytesIO(output))

/usr/local/lib/python3.5/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
    643                     skip_blank_lines=skip_blank_lines)
    644 
--> 645         return _read(filepath_or_buffer, kwds)
    646 
    647     parser_f.__name__ = name

/usr/local/lib/python3.5/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    398         return parser
    399 
--> 400     data = parser.read()
    401     parser.close()
    402     return data

/usr/local/lib/python3.5/site-packages/pandas/io/parsers.py in read(self, nrows)
    936                 raise ValueError('skipfooter not supported for iteration')
    937 
--> 938         ret = self._engine.read(nrows)
    939 
    940         if self.options.get('as_recarray'):

/usr/local/lib/python3.5/site-packages/pandas/io/parsers.py in read(self, nrows)
   1503     def read(self, nrows=None):
   1504         try:
-> 1505             data = self._reader.read(nrows)
   1506         except StopIteration:
   1507             if self._first_chunk:

pandas/parser.pyx in pandas.parser.TextReader.read (pandas/parser.c:9884)()

pandas/parser.pyx in pandas.parser.TextReader._read_low_memory (pandas/parser.c:10142)()

pandas/parser.pyx in pandas.parser.TextReader._read_rows (pandas/parser.c:10870)()

pandas/parser.pyx in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:10741)()

pandas/parser.pyx in pandas.parser.raise_parser_error (pandas/parser.c:25878)()

CParserError: Error tokenizing data. C error: Expected 2 fields in line 733, saw 3

I tried to use sep parameter as \t. Nevertheless, it did not worked. What can I do?

Issue Analytics

State:
Created 7 years ago
Comments:9 (5 by maintainers)

Top GitHub Comments

1reaction

RAHAAMAcommented, Feb 26, 2018

b'Skipping line 28: expected 2 fields, saw 4\nSkipping line 29: expected 2 fields, saw 4\nSkipping line 30: expected 2 fields, saw 4\nSkipping line 31: expected 2 fields, saw 4\nSkipping line 32: expected 2 fields, saw 4\nSkipping line 33: expected 2 fields, saw 4\nSkipping line 34: expected 2 fields, saw 4\nSkipping line 35: expected 2 fields, saw 4\nSkipping line 36: expected 2 fields, saw 4\nSkipping line 37: expected 2 fields, saw 4\nSkipping line 38: expected 2 fields, saw 4\nSkipping line 39: expected 2 fields, saw 4\nSkipping line 40: expected 2 fields, saw 4\nSkipping line

I got above warnings also , I have set pandas_options={'error_bad_lines': False}

1reaction

chezoucommented, Jan 15, 2017

@alonsopg Did your problem solve with updated version? If so, I would like to close this issue.

Top Results From Across the Web

Python Pandas Error tokenizing data - csv - Stack Overflow

The error gives a clue to solve the problem " Expected 2 fields in line 3, saw 12", saw 12 means length of...

How To Fix pandas.parser.CParserError: Error tokenizing data

pandas.errors.ParserError: Error tokenizing data. C error: Expected 5 fields in line 2, saw 6. In today's short guide we will discuss why ...

How to fix CParserError: Error tokenizing data

The Error tokenizing data may arise when you're using separator (for eg. comma ',') as a delimiter and you have more separator than...

How To Solve Python Pandas Error Tokenizing Data Error?

While reading a CSV file, you may get the “Pandas Error Tokenizing Data“. This mostly occurs due to the incorrect data in the...

Error tokenizing data. C error: Expected 9 fields in line

_libs.parsers.raise_parser_error pandas.errors.ParserError: Error tokenizing data. C error: Expected 9 fields in line 4, saw 10.

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

CParserError: Error tokenizing data. C error: Expected 2 fields in line 733, saw 3

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Given a document how ignore the header and set the columns of a table?

Not recognizing Template folder [BUG]