question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Columns of text are being merged in OCR result. How to read text from columns without merging bounding boxes?

See original GitHub issue

Hi,

I want to perform OCR on the following image:

23_1

As can be seen, there are 3 columns of the main text. Which should be 3 separate bounding boxes. But the output is merging all 3 and reading them as one, for example the output from reader.readtext(img, detail=0, paragraph=True, decoder='beamsearch') is:

Peter; Chris and Sarah were London (Tfl) to extend the extension is Thousands of delighted to join Harriet Bakerloo line from Elephant to people travel from; to and

I even tried changing the margin parameters to prevent combining the bounding boxes:

text = reader.readtext(img, detail=0, paragraph=True, decoder='beamsearch', add_margin=0, width_ths=0)

and

text = reader.readtext(img, detail=0, paragraph=True, decoder='beamsearch', add_margin=0, width_ths=1)

both produce the same output, that merge the columns.

What can I do to prevent this merging and consider these as separate bounding boxes?

Thank you

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:10 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
rkcosmoscommented, Apr 20, 2021

Last code update allow you to use both x_ths and width_ths to control merging behaviour when setting paragraph=True. Setting both to zero should give you desirable outcome. You can install the latest version by pip install git+git://github.com/jaidedai/easyocr.git. Pip version will be updated soon after some tests.

1reaction
rkcosmoscommented, May 23, 2021

I just updated API documentation.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Merge and Center Text without Merging Cells in Excel
Merging cells is effective but creates headaches too.In this video I show you a feature in the guts of Excel that there is...
Read more >
How to append list of bounding box co-ordinates and text ...
The desired output can be achieved by pre-processing the data to extract relevant information and then combine the datasets using .merge() .
Read more >
Multi-Column Table OCR - PyImageSearch
In this tutorial, you will: Discover a technique for associating rows and columns together; Learn how to detect tables of text/data in an ......
Read more >
Merge and unmerge cells - Microsoft Support
Select the cell or column that contains the text you want to split. · Note: Select as many rows as you want, but...
Read more >
How to merge two columns in Excel without losing data
From this short article you will learn how to merge multiple Excel columns into one without losing data.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found