Avoid hierarchal (tree) structure in the textlayer of pdfjs
See original GitHub issueI have been using Pdfjs for quite a long but with the recent update started to face issues with the textlayer structure. Previously the textlayer was flattened but with the latest release textlayer started to have hierarchy (tree) structure.
Previous version Example:(2.8)
<div id="textlayer">
<span>Hello World !!!</span>
<span>I'm confused </span>
</div>
Latest version Example:(2.14)
<div id="textlayer">
<span>
<br>
<span>Hello World !!!</span>
</span>
<span>
<br>
<br>
<span>I'm confused</span>
</span>
</div>
any way to get a flat structure on the textlayer?
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:5 (1 by maintainers)
Top Results From Across the Web
Avoid hierarchal (tree) structure in the textlayer of pdfjs
I have been using Pdfjs for quite a long but with the recent update started to face issues with the textlayer structure.
Read more >Implementing form filling and accessibility in the Firefox ...
During layout, we convert all the XML elements into JavaScript objects with a tree structure. Then, we send them to the main process...
Read more >How to render text layer #7072 - mozilla/pdf.js
I am trying to render the text content of a PDF as is done here. I wasn't able to find an exposed API...
Read more >Edit document structure with the Content and Tags panels ...
The Content panel provides a hierarchical view of the objects that make up a PDF, including the PDF object itself.
Read more >Hierarchical and Distributed Machine Learning Inference ...
In this work, we study an alternative approach that mitigates such issues by "pushing" ML inference computations out of the cloud and onto...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
That version is no longer supported, please find the latest releases at https://mozilla.github.io/pdf.js/getting_started/#download
Unfortunately not, since various other parts of the default viewer uses that to improve accessibility.
When opening an issue, please provide all of the information requested in https://github.com/mozilla/pdf.js/blob/master/.github/ISSUE_TEMPLATE.md since as-is this issue unfortunately isn’t really possible to easily understand/reproduce. Of particular importance is that you provide the PDF document in question.
It’s possible that you’ll need to use the
includeMarkedContent
parameter, however it is off by default, see https://github.com/mozilla/pdf.js/blob/8bad06f1580f8235c0ab4038d9211587dd39a9b5/src/display/api.js#L1078-L1085 and https://github.com/mozilla/pdf.js/blob/8bad06f1580f8235c0ab4038d9211587dd39a9b5/src/display/api.js#L1643-L1651