question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Performance issue getTextContent() method

See original GitHub issue

Configuration:

  • Web browser and its version: Google Chrome 78.0.3904.108 / Opera 64.0.3417.92
  • Operating system and its version: macOS Majove 10.14.6
  • PDF.js version: 2.0.550: No bugs 2.2.228: Bug 2.3.200: Bug
  • Is a browser extension: No

PDF used 216 pages http://www.montefiore.ulg.ac.be/~boigelot/cours/oop/slides/oop.pdf

Steps to reproduce the problem: I am using PDFJS (pdfjs-dist) to render all pages in an iframe. I loop sequentially on pages:

  • I use the render() method to render the current page
  • I use the getTextContent() method to apply custom text highlights on the text layer

This process was quite fast using version 2.0.550.

However, since version 2.2.228, the getTextContent() method has performance issues:

  • For the first 50 pages of the PDF, the method is executed at a normal speed (~100ms)
  • For the following pages, the method is executed 10 times slower (~1s-3s)

I measure the execution time as follows:

let startTime;
...
.then(() =>
{
  startTime = Date.now();
  return page.getTextContent();
})
.then((textContent) =>
{
  console.log(Date.now() - startTime, pageNumber);
  ...
})

Do you have any idea what caused this performance problem on Google Chrome/Opera?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
Snuffleupaguscommented, Dec 2, 2019

I created a Fiddle where I render the pages of my PDF one at a time then add them in a div.

As mentioned in the Wiki, see https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#allthepages, rendering all of the pages at once is simply never a good idea!

0reactions
timvandermeijcommented, Dec 2, 2019

Closing since this is answered. Rendering all pages is a heavy operation that, depending on the resolution, most browsers won’t handle well. This is a browser limitation; perhaps the experimental SVG back-end could provide a solution for this, but it’s not production-ready yet. Refer to https://github.com/mozilla/pdf.js/projects/2 for the tracking of that feature.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to get clear from " method getTextContent() is undefined ...
I am tring to get an xml doc in my appBut I am getting an error in getTextContent() as "method getTextContent() is undefined...
Read more >
ComponentHost - Litho
In this method we finally remove the mountItem from the drawing pass. protected int, getChildDrawingOrder(int childCount, int i). boolean, getClipChildren().
Read more >
Read Data from XML by Using Different Parsers in Java
From the code, we can see that the getDocumentElement() method will return the root of the element, and the getElementsByTagName() method will ...
Read more >
Updating JMeter Performance Tests with an XML parser
Parsing a JMeter Test using a XML Parser to understand how we can quickly and efficiently make multiple JMeter Test changes through code....
Read more >
Node.textContent - Web APIs | MDN
... but textContent has better performance because its value is not parsed as ... Report problems with this compatibility data on GitHub ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found