question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CParserError: Error tokenizing data. C error: Expected 2 fields in line 733, saw 3

See original GitHub issue

I am trying to extract the tables from a number of pdf documents:

In:

from tabula import read_pdf_table
pdf_table = read_pdf_table("../file.pdf", pages="all")

Out:


---------------------------------------------------------------------------
CParserError                              Traceback (most recent call last)
<ipython-input-31-c86da9ee0350> in <module>()
      1 from tabula import read_pdf_table
----> 2 pdf_table = read_pdf_table("../file.pdf", pages="all")
      3 type(pdf_table)

/usr/local/lib/python3.5/site-packages/tabula/wrapper.py in read_pdf_table(input_path, options, pages, guess, area, spreadsheet, password, nospreadsheet, silent)
    100         return
    101 
--> 102     return pd.read_csv(io.BytesIO(output))

/usr/local/lib/python3.5/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
    643                     skip_blank_lines=skip_blank_lines)
    644 
--> 645         return _read(filepath_or_buffer, kwds)
    646 
    647     parser_f.__name__ = name

/usr/local/lib/python3.5/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    398         return parser
    399 
--> 400     data = parser.read()
    401     parser.close()
    402     return data

/usr/local/lib/python3.5/site-packages/pandas/io/parsers.py in read(self, nrows)
    936                 raise ValueError('skipfooter not supported for iteration')
    937 
--> 938         ret = self._engine.read(nrows)
    939 
    940         if self.options.get('as_recarray'):

/usr/local/lib/python3.5/site-packages/pandas/io/parsers.py in read(self, nrows)
   1503     def read(self, nrows=None):
   1504         try:
-> 1505             data = self._reader.read(nrows)
   1506         except StopIteration:
   1507             if self._first_chunk:

pandas/parser.pyx in pandas.parser.TextReader.read (pandas/parser.c:9884)()

pandas/parser.pyx in pandas.parser.TextReader._read_low_memory (pandas/parser.c:10142)()

pandas/parser.pyx in pandas.parser.TextReader._read_rows (pandas/parser.c:10870)()

pandas/parser.pyx in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:10741)()

pandas/parser.pyx in pandas.parser.raise_parser_error (pandas/parser.c:25878)()

CParserError: Error tokenizing data. C error: Expected 2 fields in line 733, saw 3

I tried to use sep parameter as \t. Nevertheless, it did not worked. What can I do?

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:9 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
RAHAAMAcommented, Feb 26, 2018

b'Skipping line 28: expected 2 fields, saw 4\nSkipping line 29: expected 2 fields, saw 4\nSkipping line 30: expected 2 fields, saw 4\nSkipping line 31: expected 2 fields, saw 4\nSkipping line 32: expected 2 fields, saw 4\nSkipping line 33: expected 2 fields, saw 4\nSkipping line 34: expected 2 fields, saw 4\nSkipping line 35: expected 2 fields, saw 4\nSkipping line 36: expected 2 fields, saw 4\nSkipping line 37: expected 2 fields, saw 4\nSkipping line 38: expected 2 fields, saw 4\nSkipping line 39: expected 2 fields, saw 4\nSkipping line 40: expected 2 fields, saw 4\nSkipping line

I got above warnings also , I have set pandas_options={'error_bad_lines': False}

1reaction
chezoucommented, Jan 15, 2017

@alonsopg Did your problem solve with updated version? If so, I would like to close this issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Python Pandas Error tokenizing data - csv - Stack Overflow
The error gives a clue to solve the problem " Expected 2 fields in line 3, saw 12", saw 12 means length of...
Read more >
How To Fix pandas.parser.CParserError: Error tokenizing data
pandas.errors.ParserError: Error tokenizing data. C error: Expected 5 fields in line 2, saw 6. In today's short guide we will discuss why ...
Read more >
How to fix CParserError: Error tokenizing data
The Error tokenizing data may arise when you're using separator (for eg. comma ',') as a delimiter and you have more separator than...
Read more >
How To Solve Python Pandas Error Tokenizing Data Error?
While reading a CSV file, you may get the “Pandas Error Tokenizing Data“. This mostly occurs due to the incorrect data in the...
Read more >
Error tokenizing data. C error: Expected 9 fields in line
_libs.parsers.raise_parser_error pandas.errors.ParserError: Error tokenizing data. C error: Expected 9 fields in line 4, saw 10.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found