question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Tesseract 4.0 .net wrapper Searchable pdf not rendering properly

See original GitHub issue

Tried .net wrapper of tesseract 4.0 lstm, generated the searchable pdf, but its rendering as empty with extracted text hidden behind, attaching the code, input image and output pdf. Please help me on this. Why the engine is rendering the empty pdf with no imgage.

`using (IResultRenderer renderer = Tesseract.PdfResultRenderer.CreatePdfRenderer(@"D:\out18", @"C:\tessdata\"))
                {
                    using (renderer.BeginDocument("Serachablepdftest"))
                    {
                        string configurationFilePath = @"C:\tessdata";
                        string configfile = Path.Combine(@"C:\tessdata", "pdf");
                        using (TesseractEngine engine = new TesseractEngine(configurationFilePath, "eng", EngineMode.TesseractAndLstm, configfile))
                        {
                            using (var imagefile = new Bitmap(@"C:\file-page1.jpg"))
                            {
                                using (var img = PixConverter.ToPix(imagefile))
                                {
                                    using (var page = engine.Process(img, "Serachablepdftest"))
                                    {
                                        renderer.AddPage(page);
                                    }
                                }
                            }
                        }
                    }
                }`

file-page1 out18.pdf

Issue Analytics

  • State:open
  • Created 6 years ago
  • Comments:12 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
charleswcommented, Jun 21, 2017

I think the signature of one of the methods to do with the pdf writer may have changed. Should be a relatively easy fix if so but I haven’t had the time, and to be honest motivation, to fix it.

Pull requests are of course welcome 🙂

On Wed, 21 Jun 2017, 19:47 MunavvarPatel, notifications@github.com wrote:

hi @daddy1989 https://github.com/daddy1989. I’ve got the same problem. have you found any workable solution? . Check it… Blank pdf created by feature/321-Tesseract-4 http://url

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/charlesw/tesseract/issues/341#issuecomment-310028001, or mute the thread https://github.com/notifications/unsubscribe-auth/AAPzyCWeop_rq-NT1J3_rjVM0PG5t6hVks5sGOazgaJpZM4NAHf1 .

0reactions
MunavvarPatelcommented, Jul 14, 2017

Hi @charlesw. I have made changes as you suggested, it throws an exception.

System.MissingMethodException: 'Method not found: 'Tesseract.IResultRenderer Tesseract.ResultRenderer.CreatePdfRenderer(System.String, System.String, Boolean)'.'

Read more comments on GitHub >

github_iconTop Results From Across the Web

Tesseract searchable pdf creation doesn't work
The pdf file gets created but it cannot be open. I tried it on different image formats : jpg, tif, png with no...
Read more >
OCR PDF in C# and VB.NET
It works properly for most documents. We know such documents as “searchable PDF”. Searchable PDF documents render text using special PDF ...
Read more >
tessdoc - Tesseract documentation - GitHub Pages
With the configfile 'pdf' tesseract will produce searchable PDF containing pages images with a hidden, searchable text layer.
Read more >
Tesseract OCR: Understanding the Contents of Documents ...
After reading this post, you should be able to do some basic text extraction from images of simple documents (e.g. forms, driver's licenses,...
Read more >
Tesseract OCR in C# Alternatives (99.8-100% Accuracy)
Google Tesseract for C# OCR​​ This is the right library to use for free & academic projects in C#. Tesseract is an excellent...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found