Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Performance issue getTextContent() method

See original GitHub issue

Configuration:

Web browser and its version: Google Chrome 78.0.3904.108 / Opera 64.0.3417.92
Operating system and its version: macOS Majove 10.14.6
PDF.js version: 2.0.550: No bugs 2.2.228: Bug 2.3.200: Bug
Is a browser extension: No

PDF used 216 pages http://www.montefiore.ulg.ac.be/~boigelot/cours/oop/slides/oop.pdf

Steps to reproduce the problem: I am using PDFJS (pdfjs-dist) to render all pages in an iframe. I loop sequentially on pages:

I use the render() method to render the current page
I use the getTextContent() method to apply custom text highlights on the text layer

This process was quite fast using version 2.0.550.

However, since version 2.2.228, the getTextContent() method has performance issues:

For the first 50 pages of the PDF, the method is executed at a normal speed (~100ms)
For the following pages, the method is executed 10 times slower (~1s-3s)

I measure the execution time as follows:

let startTime;
...
.then(() =>
{
  startTime = Date.now();
  return page.getTextContent();
})
.then((textContent) =>
{
  console.log(Date.now() - startTime, pageNumber);
  ...
})

Do you have any idea what caused this performance problem on Google Chrome/Opera?

Issue Analytics

State:
Created 4 years ago
Reactions:1
Comments:6 (1 by maintainers)

Top GitHub Comments

1reaction

Snuffleupaguscommented, Dec 2, 2019

I created a Fiddle where I render the pages of my PDF one at a time then add them in a div.

As mentioned in the Wiki, see https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#allthepages, rendering all of the pages at once is simply never a good idea!

0reactions

timvandermeijcommented, Dec 2, 2019

Closing since this is answered. Rendering all pages is a heavy operation that, depending on the resolution, most browsers won’t handle well. This is a browser limitation; perhaps the experimental SVG back-end could provide a solution for this, but it’s not production-ready yet. Refer to https://github.com/mozilla/pdf.js/projects/2 for the tracking of that feature.

Top Results From Across the Web

How to get clear from " method getTextContent() is undefined ...

I am tring to get an xml doc in my appBut I am getting an error in getTextContent() as "method getTextContent() is undefined...

ComponentHost - Litho

In this method we finally remove the mountItem from the drawing pass. protected int, getChildDrawingOrder(int childCount, int i). boolean, getClipChildren().

Read Data from XML by Using Different Parsers in Java

From the code, we can see that the getDocumentElement() method will return the root of the element, and the getElementsByTagName() method will ...

Updating JMeter Performance Tests with an XML parser

Parsing a JMeter Test using a XML Parser to understand how we can quickly and efficiently make multiple JMeter Test changes through code....

Node.textContent - Web APIs | MDN

... but textContent has better performance because its value is not parsed as ... Report problems with this compatibility data on GitHub ...