Tesseract 4.0 .net wrapper Searchable pdf not rendering properly
See original GitHub issueTried .net wrapper of tesseract 4.0 lstm, generated the searchable pdf, but its rendering as empty with extracted text hidden behind, attaching the code, input image and output pdf. Please help me on this. Why the engine is rendering the empty pdf with no imgage.
`using (IResultRenderer renderer = Tesseract.PdfResultRenderer.CreatePdfRenderer(@"D:\out18", @"C:\tessdata\"))
{
using (renderer.BeginDocument("Serachablepdftest"))
{
string configurationFilePath = @"C:\tessdata";
string configfile = Path.Combine(@"C:\tessdata", "pdf");
using (TesseractEngine engine = new TesseractEngine(configurationFilePath, "eng", EngineMode.TesseractAndLstm, configfile))
{
using (var imagefile = new Bitmap(@"C:\file-page1.jpg"))
{
using (var img = PixConverter.ToPix(imagefile))
{
using (var page = engine.Process(img, "Serachablepdftest"))
{
renderer.AddPage(page);
}
}
}
}
}
}`
Issue Analytics
- State:
- Created 6 years ago
- Comments:12 (6 by maintainers)
Top Results From Across the Web
Tesseract searchable pdf creation doesn't work
The pdf file gets created but it cannot be open. I tried it on different image formats : jpg, tif, png with no...
Read more >OCR PDF in C# and VB.NET
It works properly for most documents. We know such documents as “searchable PDF”. Searchable PDF documents render text using special PDF ...
Read more >tessdoc - Tesseract documentation - GitHub Pages
With the configfile 'pdf' tesseract will produce searchable PDF containing pages images with a hidden, searchable text layer.
Read more >Tesseract OCR: Understanding the Contents of Documents ...
After reading this post, you should be able to do some basic text extraction from images of simple documents (e.g. forms, driver's licenses,...
Read more >Tesseract OCR in C# Alternatives (99.8-100% Accuracy)
Google Tesseract for C# OCR This is the right library to use for free & academic projects in C#. Tesseract is an excellent...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I think the signature of one of the methods to do with the pdf writer may have changed. Should be a relatively easy fix if so but I haven’t had the time, and to be honest motivation, to fix it.
Pull requests are of course welcome 🙂
On Wed, 21 Jun 2017, 19:47 MunavvarPatel, notifications@github.com wrote:
Hi @charlesw. I have made changes as you suggested, it throws an exception.
System.MissingMethodException: 'Method not found: 'Tesseract.IResultRenderer Tesseract.ResultRenderer.CreatePdfRenderer(System.String, System.String, Boolean)'.'