Apply detect() on readable PDF files
See original GitHub issueHi there,
from the docs I infere that detect()
operates, for example, on PIL.Image
objects. Is there way to directly operate on already readable PDF files (which obviates the need applying OCR as well).
Greetings
Issue Analytics
- State:
- Created 2 years ago
- Comments:12 (4 by maintainers)
Top Results From Across the Web
How to use OCR software for PDFs in 4 easy steps - Adobe
With optical character recognition (OCR) in Adobe Acrobat, you can extract text and convert scanned documents into editable, searchable PDF files instantly.
Read more >How to detect a searchable pdf from a non-searchable one?
Try a PDF text Extractor (like Tika) first. Most likely it Returns no or very Little text. In that case Switch to OCR....
Read more >Detect text in files (PDF/TIFF) | Cloud Vision API - Google Cloud
The Vision API can detect and transcribe text from PDF and TIFF files stored in Cloud Storage. Document text detection from PDF and...
Read more >HOW TO: Determine if a PDF is Searchable in V20 - LeadTools
These first two examples (C# and VB) test the first page of the PDF to see if text is available to be read....
Read more >Making A PDF Text-searchable
Click on Tools > Text Recognition > In This File. Text recognition menu. The Recognize Text popup box opens. Select All pages, then...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Solved it like this with PyMuPdf (pip install pymupdf). I hope it can help someone with the same issue. Check also the pymupdf utility for retrieving text out of certain box coordinate
Sorry for the confusion of terminologies. I am still learning pdf related stuff.
See #71 and #72