CParserError: Error tokenizing data. C error: Expected 2 fields in line 733, saw 3
See original GitHub issueI am trying to extract the tables from a number of pdf documents:
In:
from tabula import read_pdf_table
pdf_table = read_pdf_table("../file.pdf", pages="all")
Out:
---------------------------------------------------------------------------
CParserError Traceback (most recent call last)
<ipython-input-31-c86da9ee0350> in <module>()
1 from tabula import read_pdf_table
----> 2 pdf_table = read_pdf_table("../file.pdf", pages="all")
3 type(pdf_table)
/usr/local/lib/python3.5/site-packages/tabula/wrapper.py in read_pdf_table(input_path, options, pages, guess, area, spreadsheet, password, nospreadsheet, silent)
100 return
101
--> 102 return pd.read_csv(io.BytesIO(output))
/usr/local/lib/python3.5/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
643 skip_blank_lines=skip_blank_lines)
644
--> 645 return _read(filepath_or_buffer, kwds)
646
647 parser_f.__name__ = name
/usr/local/lib/python3.5/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
398 return parser
399
--> 400 data = parser.read()
401 parser.close()
402 return data
/usr/local/lib/python3.5/site-packages/pandas/io/parsers.py in read(self, nrows)
936 raise ValueError('skipfooter not supported for iteration')
937
--> 938 ret = self._engine.read(nrows)
939
940 if self.options.get('as_recarray'):
/usr/local/lib/python3.5/site-packages/pandas/io/parsers.py in read(self, nrows)
1503 def read(self, nrows=None):
1504 try:
-> 1505 data = self._reader.read(nrows)
1506 except StopIteration:
1507 if self._first_chunk:
pandas/parser.pyx in pandas.parser.TextReader.read (pandas/parser.c:9884)()
pandas/parser.pyx in pandas.parser.TextReader._read_low_memory (pandas/parser.c:10142)()
pandas/parser.pyx in pandas.parser.TextReader._read_rows (pandas/parser.c:10870)()
pandas/parser.pyx in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:10741)()
pandas/parser.pyx in pandas.parser.raise_parser_error (pandas/parser.c:25878)()
CParserError: Error tokenizing data. C error: Expected 2 fields in line 733, saw 3
I tried to use sep
parameter as \t
. Nevertheless, it did not worked. What can I do?
Issue Analytics
- State:
- Created 7 years ago
- Comments:9 (5 by maintainers)
Top Results From Across the Web
Python Pandas Error tokenizing data - csv - Stack Overflow
The error gives a clue to solve the problem " Expected 2 fields in line 3, saw 12", saw 12 means length of...
Read more >How To Fix pandas.parser.CParserError: Error tokenizing data
pandas.errors.ParserError: Error tokenizing data. C error: Expected 5 fields in line 2, saw 6. In today's short guide we will discuss why ...
Read more >How to fix CParserError: Error tokenizing data
The Error tokenizing data may arise when you're using separator (for eg. comma ',') as a delimiter and you have more separator than...
Read more >How To Solve Python Pandas Error Tokenizing Data Error?
While reading a CSV file, you may get the “Pandas Error Tokenizing Data“. This mostly occurs due to the incorrect data in the...
Read more >Error tokenizing data. C error: Expected 9 fields in line
_libs.parsers.raise_parser_error pandas.errors.ParserError: Error tokenizing data. C error: Expected 9 fields in line 4, saw 10.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
b'Skipping line 28: expected 2 fields, saw 4\nSkipping line 29: expected 2 fields, saw 4\nSkipping line 30: expected 2 fields, saw 4\nSkipping line 31: expected 2 fields, saw 4\nSkipping line 32: expected 2 fields, saw 4\nSkipping line 33: expected 2 fields, saw 4\nSkipping line 34: expected 2 fields, saw 4\nSkipping line 35: expected 2 fields, saw 4\nSkipping line 36: expected 2 fields, saw 4\nSkipping line 37: expected 2 fields, saw 4\nSkipping line 38: expected 2 fields, saw 4\nSkipping line 39: expected 2 fields, saw 4\nSkipping line 40: expected 2 fields, saw 4\nSkipping line
I got above warnings also , I have set
pandas_options={'error_bad_lines': False}
@alonsopg Did your problem solve with updated version? If so, I would like to close this issue.