question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Avoid hierarchal (tree) structure in the textlayer of pdfjs

See original GitHub issue

I have been using Pdfjs for quite a long but with the recent update started to face issues with the textlayer structure. Previously the textlayer was flattened but with the latest release textlayer started to have hierarchy (tree) structure.
Previous version Example:(2.8)

<div id="textlayer">
<span>Hello World !!!</span>
<span>I'm confused </span>
</div>

Latest version Example:(2.14)

<div id="textlayer">
<span>
<br>
<span>Hello World !!!</span>
</span>
<span>
<br>
<br>
<span>I'm confused</span>
</span>
</div>

any way to get a flat structure on the textlayer?

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:1
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
Snuffleupaguscommented, Aug 5, 2022

https://cdn.jsdelivr.net/npm/pdfjs-dist@2.14.305/web/pdf_viewer.js

That version is no longer supported, please find the latest releases at https://mozilla.github.io/pdf.js/getting_started/#download

is there any way to achieve this without changing in the pdf viewer file

Unfortunately not, since various other parts of the default viewer uses that to improve accessibility.

1reaction
Snuffleupaguscommented, Aug 4, 2022

When opening an issue, please provide all of the information requested in https://github.com/mozilla/pdf.js/blob/master/.github/ISSUE_TEMPLATE.md since as-is this issue unfortunately isn’t really possible to easily understand/reproduce. Of particular importance is that you provide the PDF document in question.


It’s possible that you’ll need to use the includeMarkedContent parameter, however it is off by default, see https://github.com/mozilla/pdf.js/blob/8bad06f1580f8235c0ab4038d9211587dd39a9b5/src/display/api.js#L1078-L1085 and https://github.com/mozilla/pdf.js/blob/8bad06f1580f8235c0ab4038d9211587dd39a9b5/src/display/api.js#L1643-L1651

Read more comments on GitHub >

github_iconTop Results From Across the Web

Avoid hierarchal (tree) structure in the textlayer of pdfjs
I have been using Pdfjs for quite a long but with the recent update started to face issues with the textlayer structure.
Read more >
Implementing form filling and accessibility in the Firefox ...
During layout, we convert all the XML elements into JavaScript objects with a tree structure. Then, we send them to the main process...
Read more >
How to render text layer #7072 - mozilla/pdf.js
I am trying to render the text content of a PDF as is done here. I wasn't able to find an exposed API...
Read more >
Edit document structure with the Content and Tags panels ...
The Content panel provides a hierarchical view of the objects that make up a PDF, including the PDF object itself.
Read more >
Hierarchical and Distributed Machine Learning Inference ...
In this work, we study an alternative approach that mitigates such issues by "pushing" ML inference computations out of the cloud and onto...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found