Columns of text are being merged in OCR result. How to read text from columns without merging bounding boxes?
See original GitHub issueHi,
I want to perform OCR on the following image:
As can be seen, there are 3 columns of the main text. Which should be 3 separate bounding boxes. But the output is merging all 3 and reading them as one, for example the output from reader.readtext(img, detail=0, paragraph=True, decoder='beamsearch')
is:
Peter; Chris and Sarah were London (Tfl) to extend the extension is Thousands of delighted to join Harriet Bakerloo line from Elephant to people travel from; to and
I even tried changing the margin parameters to prevent combining the bounding boxes:
text = reader.readtext(img, detail=0, paragraph=True, decoder='beamsearch', add_margin=0, width_ths=0)
and
text = reader.readtext(img, detail=0, paragraph=True, decoder='beamsearch', add_margin=0, width_ths=1)
both produce the same output, that merge the columns.
What can I do to prevent this merging and consider these as separate bounding boxes?
Thank you
Issue Analytics
- State:
- Created 2 years ago
- Comments:10 (5 by maintainers)
Top Results From Across the Web
Merge and Center Text without Merging Cells in Excel
Merging cells is effective but creates headaches too.In this video I show you a feature in the guts of Excel that there is...
Read more >How to append list of bounding box co-ordinates and text ...
The desired output can be achieved by pre-processing the data to extract relevant information and then combine the datasets using .merge() .
Read more >Multi-Column Table OCR - PyImageSearch
In this tutorial, you will: Discover a technique for associating rows and columns together; Learn how to detect tables of text/data in an ......
Read more >Merge and unmerge cells - Microsoft Support
Select the cell or column that contains the text you want to split. · Note: Select as many rows as you want, but...
Read more >How to merge two columns in Excel without losing data
From this short article you will learn how to merge multiple Excel columns into one without losing data.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Last code update allow you to use both
x_ths
andwidth_ths
to control merging behaviour when settingparagraph=True
. Setting both to zero should give you desirable outcome. You can install the latest version bypip install git+git://github.com/jaidedai/easyocr.git
. Pip version will be updated soon after some tests.I just updated API documentation.