question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

PDF Generation hanging for documents with many large images

See original GitHub issue

Steps to reproduce

Tell us about your environment:

  • Puppeteer version: 0.13.0
  • Platform / OS version: OSX 10.13.1
  • Node versions: tested with v8.6.0, v8.9.0 & v9.2.0
  • URLs (if applicable): N/A

What steps will reproduce the problem?

The code below isolates a problem discovered while using Puppeteer to generate PDF reports. While Puppeteer has been fantastic so far, I have come across a problem with pages containing a large number of images.

index.html:

<!doctype html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
    <title>Test App</title>
  </head>
  <body>
    <div id="root"></div>
  </body>
</html>

pdfImages.js:

const puppeteer = require('puppeteer');

const IMAGE_URIS = [
  // A list of 1949 image URIs as strings
  // The images used range in size between 1KB to 1.7MB with an average size of 200KB.
];

const PDF_OPTIONS = {path: `${__dirname}/output.pdf`, format: 'A4', printBackground: true, landscape: false, margin: {top: '10mm', left: '10mm', right: '10mm', bottom: '15mm'}};
const IMAGE_LOADING_TIMEOUT = 60 * 1000 * 5;

function addImagesToPage(imageUriList) {
  const root = document.getElementById('root');
  imageUriList.forEach((imageUri) => {
    const div = document.createElement('div');
    const img = new Image();
    img.src = imageUri;
    img.style = 'max-width: 20vw; max-height: 20vh;';
    div.appendChild(img);
    root.appendChild(div);
  });
}

function waitForAllImagesToCompleteLoading() {
  const allImagesInDocument = Array.from(document.getElementsByTagName('img'));
  return allImagesInDocument
      .map((img) => img.complete)
      .every((completeStatus) => completeStatus);
}

let browser;
puppeteer.launch({headless: true})
  .then((newBrowser) => {
    browser = newBrowser;
    return browser.newPage();
  })
  .then((page) => {
    return page.goto(`file://${__dirname}/index.html`)
    .then(() => page.evaluate(addImagesToPage, IMAGE_URIS))
    .then(() => page.waitForFunction(waitForAllImagesToCompleteLoading, {timeout: IMAGE_LOADING_TIMEOUT}))
    .then(() => page.pdf(PDF_OPTIONS))
    .then(() => page.close());
  })
  .then(() => browser.close());

I run this using: env DEBUG="puppeteer:*" node --max-old-space-size=16384 pdfImages.js

What is the expected result? Running this code should generate a PDF.

Please note that running this with 1100 images and with --max-old-space-size=8174, the code runs without a problem.

What happens instead? Running the code with the command above causes the code to hang on the PDF generation stage. Please see HungPdfPrint.log for the logs produced when this happens.

The code consistently hangs in the following combinations:

  • When --max-old-space-size=8174 (8GB) or 16348 (16GB) and,
  • When the number of images is 1200, 1500 or 1949.

The code crashes when the --max-old-space-size flag is removed with an out of heap memory error. See this log: OutOfMemoryCrash.log

Again, when the number of images is 1100 and --max-old-space-size=8174 the code runs without a problem

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:17
  • Comments:27

github_iconTop GitHub Comments

6reactions
HugoDFcommented, Dec 8, 2017

I’m having the same issues taking PDF snapshots of HTML containing images, where the combined image size is more than 200MB, if we have 193MB of images it works but over 200 it hangs.

5reactions
galvezcommented, Oct 23, 2018

@Mgonand the only way you’re getting this to work reliably is by creating a batch of PDF files and then joining them in a final file (I used pdftk). I use puppeteer to generate series of 10-image PDFs and then join one final file.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Generating large pdf in javascript: browser kills hanging page
I'm developping an image cutting application using javascript. The main goal is to cut an image into easily printable pieces and generate a ......
Read more >
PDF sizing (when there's lots of images) - Content - SitePoint
I'm needing to put together some PDFs each contain a lot of images, however when I create the PDF I'm left with an...
Read more >
How to Fix Slow PDF Files - Small Business - Chron.com
PDFs can be slow because they hold too much data or contain unnecessary objects inserted by the programs that created the PDFs. Approaches...
Read more >
3 Ways to Reduce PDF File Size - wikiHow
1. Go to https://www.adobe.com/acrobat/online/compress-pdf.html. Adobe offers a free online PDF-compression tool that's great for reducing PDF file sizes. 2. Click the Select a...
Read more >
5 Ways to Convert PDF to Image Files - wikiHow
1. Go to http://pdftoimage.com/ in your computer's web browser. This site allows you to convert entire PDFs into separate JPEG or PNG files. Warning:...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found