Performance issue getTextContent() method
See original GitHub issueConfiguration:
- Web browser and its version: Google Chrome 78.0.3904.108 / Opera 64.0.3417.92
- Operating system and its version: macOS Majove 10.14.6
- PDF.js version: 2.0.550: No bugs 2.2.228: Bug 2.3.200: Bug
- Is a browser extension: No
PDF used 216 pages http://www.montefiore.ulg.ac.be/~boigelot/cours/oop/slides/oop.pdf
Steps to reproduce the problem:
I am using PDFJS (pdfjs-dist
) to render all pages in an iframe. I loop sequentially on pages:
- I use the
render()
method to render the current page - I use the
getTextContent()
method to apply custom text highlights on the text layer
This process was quite fast using version 2.0.550.
However, since version 2.2.228, the getTextContent()
method has performance issues:
- For the first 50 pages of the PDF, the method is executed at a normal speed (~100ms)
- For the following pages, the method is executed 10 times slower (~1s-3s)
I measure the execution time as follows:
let startTime;
...
.then(() =>
{
startTime = Date.now();
return page.getTextContent();
})
.then((textContent) =>
{
console.log(Date.now() - startTime, pageNumber);
...
})
Do you have any idea what caused this performance problem on Google Chrome/Opera?
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:6 (1 by maintainers)
Top Results From Across the Web
How to get clear from " method getTextContent() is undefined ...
I am tring to get an xml doc in my appBut I am getting an error in getTextContent() as "method getTextContent() is undefined...
Read more >ComponentHost - Litho
In this method we finally remove the mountItem from the drawing pass. protected int, getChildDrawingOrder(int childCount, int i). boolean, getClipChildren().
Read more >Read Data from XML by Using Different Parsers in Java
From the code, we can see that the getDocumentElement() method will return the root of the element, and the getElementsByTagName() method will ...
Read more >Updating JMeter Performance Tests with an XML parser
Parsing a JMeter Test using a XML Parser to understand how we can quickly and efficiently make multiple JMeter Test changes through code....
Read more >Node.textContent - Web APIs | MDN
... but textContent has better performance because its value is not parsed as ... Report problems with this compatibility data on GitHub ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
As mentioned in the Wiki, see https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#allthepages, rendering all of the pages at once is simply never a good idea!
Closing since this is answered. Rendering all pages is a heavy operation that, depending on the resolution, most browsers won’t handle well. This is a browser limitation; perhaps the experimental SVG back-end could provide a solution for this, but it’s not production-ready yet. Refer to https://github.com/mozilla/pdf.js/projects/2 for the tracking of that feature.