question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

IndexError while using split_text

See original GitHub issue

IndexError thrown when using split_text=True

Traceback (most recent call last):
  File "/code/seperate_page.py", line 16, in <module>
    tables = camelot.read_pdf(out_filename,
  File "/usr/local/lib/python3.9/site-packages/camelot/io.py", line 113, in read_pdf
    tables = p.parse(
  File "/usr/local/lib/python3.9/site-packages/camelot/handlers.py", line 176, in parse
    t = parser.extract_tables(
  File "/usr/local/lib/python3.9/site-packages/camelot/parsers/lattice.py", line 431, in extract_tables
    table = self._generate_table(table_idx, cols, rows, v_s=v_s, h_s=h_s)
  File "/usr/local/lib/python3.9/site-packages/camelot/parsers/lattice.py", line 372, in _generate_table
    indices = Lattice._reduce_index(
  File "/usr/local/lib/python3.9/site-packages/camelot/parsers/lattice.py", line 191, in _reduce_index
    if t.cells[r_idx][c_idx].hspan:
IndexError: list index out of range

Steps to reproduce the bug

Code

import camelot

# add your code here
tables = camelot.read_pdf('service_providers_ul.0.pdf',
                          backend='poppler',
                          pages='1',
                          flavor='lattice',
                          split_text=True)

PDF

service_providers_ul.0.pdf

Screenshots

Not Applicable

Environment

  • OS: Linux
  • Python version: 3.9.4
  • Numpy version: 1.22.1
  • OpenCV version: 4.5.5.6
  • Ghostscript version: 0.7
  • Camelot version: 0.10.1

Additional context

There is an empty textline in one of the rows which goes past the edge of the last column, this causes split_textline code to assign a column index past the availble column indices, and causes the code to throw an exception further down the line when the assigned column index is used

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:11

github_iconTop GitHub Comments

2reactions
ramSeraphcommented, Mar 18, 2022

@ramSeraph Can you elaborate on how to implement the fix for this? I think I have the same issue with this doc:

Example 1.pdf

it might be the same issue. Running with the following fix went through

diff --git a/camelot/utils.py b/camelot/utils.py
index 404c00b..e5f2cbc 100644
--- a/camelot/utils.py
+++ b/camelot/utils.py
@@ -623,7 +623,8 @@ def split_textline(table, textline, direction, flag_size=False, strip_text=""):
                         else:
                             # TODO: add test
                             if cut == x_cuts[-1]:
-                                cut_text.append((r, cut[0] + 1, obj))
+                                new_idx = min(cut[0] + 1, len(table.cols) - 1)
+                                cut_text.append((r, new_idx, obj))
                     elif isinstance(obj, LTAnno):
                         cut_text.append((r, cut[0], obj))
         elif direction == "vertical" and not textline.is_empty():
@@ -656,7 +657,8 @@ def split_textline(table, textline, direction, flag_size=False, strip_text=""):
                         else:
                             # TODO: add test
                             if cut == y_cuts[-1]:
-                                cut_text.append((cut[0] - 1, c, obj))
+                                new_idx = max(cut[0] - 1, 0)
+                                cut_text.append((new_idx, c, obj))
                     elif isinstance(obj, LTAnno):
                         cut_text.append((cut[0], c, obj))
     except IndexError:
1reaction
ramSeraphcommented, Mar 18, 2022

I personally pinned the camelot version in my project and monkeypatched the fix in.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Python list index out of range on return value of split
When you are working with list and trying to get value at particular index, it is always safe to see in ...
Read more >
Solved: Split Text Base on Multiple Delimiters - Esri Community
IndexError: list index out of range. I think this is caused by gaps in my data. I looked to make sure it was...
Read more >
IndexError: list index out of range In extract the mode, product ...
Hello everybody, thanks in advance for you're insight !! After installing Snappy on my environment, I'am trying to preprocess sentinel 1 ...
Read more >
IndexError: list index out of range - Python Forum
I am facing below error while executing the code . Dont know where exactly the error exists . Could you please help me...
Read more >
How to Fix IndexError in Python - Rollbar
Table of Contents. What Causes IndexError; Python IndexError Example; How to Fix IndexError in Python; Track, Analyze and Manage Errors With ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found