Hangup when passing in long list of images
See original GitHub issueTesseract has the option of taking in a text file containing a list of images to process. This is a much faster way of processing a series of images instead of having to reinitialize Tesseract for each one.
This module also correctly handled a text file containing a list of images if the list was short (<50).
text = pytesseract.image_to_string(txt_file_of_img_file_names, config=tess_config)
Longer lists of images lead to the module being hung up indefinitely and it never returned a result. Tesseract has no problem handling this larger txt file from the command line. The problem seems to be with pytesseract.
Issue Analytics
- State:
- Created 5 years ago
- Comments:8
Top Results From Across the Web
Argument list too long error for rm, cp, mv commands
The reason this occurs is because bash actually expands the asterisk to every matching file, producing a very long command line. Try this:...
Read more >How to Hang a Picture: 5 Tips for Hanging Photos on the Wall
Step 4: Hang the thing If you're hanging a super-heavy piece, first use a stud-finder to locate a stud and check if it's...
Read more >Windows could not parse or process unattend answer file / The ...
Its no problem to create and deploy an image, if there is no sysprep option ... answer file [C:\Windows\Panther\unattend.xml] for pass [specialize].
Read more >I stumbled across a huge Airbnb scam that's taking over London
When I look on Airbnb, I find 28 listings, each a confusing hodgepodge of all the others – the pictures, descriptions and property...
Read more >Fix Samsung Smart Switch Stuck at 99%
You can also delete its existing photos, videos, music, etc. to free up some space. smart-switch-stuck-at-99-4. 5. Don't include Apps while Transferring Data....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Here are my results from using batching.
A batch job of 100 images.
If I run each image through Tesseract individually (looping in a bash script) the total time is
Hello @plestran and @makcedward , can you please test the latest master version of pytesseract and report if the issue still persist?
I tested with img list full of 330+ references to the test.png image included in the pytesseract src folder/package.
Feel free to reopen if the issue persist.