question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Puppetter Page.close() not freeing tab memory

See original GitHub issue

I’m currently using puppeteer for web scraping using one browser instance on which i connect several node apps using the websocket endpoint… I run it on 50 websites every 15 minutes, with a process queue running 40 of them concurrently. But as time goes by, the memory utilization goes crazy, and the only solution i found was to restart the browser.

Steps to reproduce

Tell us about your environment:

  • Puppeteer version: tip of the tree
  • Platform / OS version: Ubuntu 16.04

What steps will reproduce the problem?

This is the code i use to initiate the chrome instance

const puppeteer = require("puppeteer");

(async () => {
  const browser = await puppeteer.launch({
    headless: true,
    args: ["--remote-debugging-address=0.0.0.0", "--remote-debugging-port=1234"]
  });
  const wsendpoint = await browser.wsEndpoint();
  console.log(wsendpoint);
})();

This is the code I use to connect to a remote chrome instance

const puppeteer = require("puppeteer");
const axios = require("axios");

const extract = async () => {
  const url = process.argv[2];
  const remote_ip = process.argv[3];
  const remote_port = process.argv[4];
  const json = await axios.get(`http://${remote_ip}:${remote_port}/json/version`);
  const wsEndpoint = json.data.webSocketDebuggerUrl;
  const browser = await puppeteer.connect({ browserWSEndpoint: wsEndpoint, ignoreHTTPSErrors: true });
  const page = await browser.newPage();
  await page.goto(url, { waitUntil: "domcontentloaded", timeout: 300000 });

  // Extract the results from the page
  const links = await page.evaluate(() => {
    // Function to check if a tag is visually visible on the page
    const visible = (styles, DOMRect) => {
      if (styles.getPropertyValue("display") == "none" || styles.getPropertyValue("visibility") == "hidden" || styles.getPropertyValue("opacity") == 0) {
        return false;
      } else if (DOMRect.x < 0 || DOMRect.y < 0 || DOMRect.height == 0 || DOMRect.width == 0) {
        return false;
      }
      return true;
    };

    const pageSize = { width: document.body.scrollWidth, height: document.body.scrollHeight };
    const tags = Array.from(document.querySelectorAll("a"));
    return tags.map(tag => {
      const styles = window.getComputedStyle(tag, null);
      const DOMRect = tag.getBoundingClientRect();
      const image = tag.getElementsByTagName("img");
      return {
        url: tag.href,
        fontSize: parseInt(styles.getPropertyValue("font-size")),
        position: { x: DOMRect.x, y: DOMRect.y, left: DOMRect.left, right: DOMRect.right, top: DOMRect.top, bottom: DOMRect.bottom, width: DOMRect.width, height: DOMRect.height },
        visible: visible(styles, DOMRect),
        image: image.length > 0,
        pageSize: pageSize
      };
    });
  });
  await page.close();
  await browser.disconnect();
  return links;
};

extract()
  .then(links => {
    console.log(JSON.stringify(links));
  })
  .catch(e => {
    console.log(e.message);
    process.exit(1);
  });

What is the expected result?

I would expect the chrome instance to be steady on Memory utilization as i close every tab i open using Page.close()

What happens instead?

The chrome instance Memory utilization is growing as time goes by, And i need to restart the full chrome process for it to goes down again.

image

It might be a chromium related issue, but i’m not sure, so i thought i would start posting it here.

Thank you.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:17
  • Comments:9

github_iconTop GitHub Comments

19reactions
grantstephenscommented, Feb 16, 2018

I found that by doing await page.goto('about:blank') freed everything up before closing the tabs

1reaction
kireerikcommented, Apr 18, 2018

I think as of version 1.3.0 opening a blank page before closing tabs is no longer needed to fee up memory.

@grantstephens and @galvez can you confirm that?

Read more comments on GitHub >

github_iconTop Results From Across the Web

node.js - Managing puppeteer for memory and performance
You can open and close a Page using the following method: const page = await browser. newPage(); await page. close();
Read more >
How to fix RAM-leaking libraries like Puppeteer easily ... - Reddit
How to fix RAM-leaking libraries like Puppeteer easily. Universal way to fix RAM leaks once and forever : r/programming.
Read more >
Puppeteer | Puppeteer
Puppeteer is a Node.js library which provides a high-level API to control Chrome/Chromium over the DevTools Protocol. Puppeteer runs in headless mode by ......
Read more >
Getting to Know Puppeteer Using Practical Examples
An overview, concrete guide and kinda cheat sheet for the popular browser automation library, based on Node.js, which provides a high-level ...
Read more >
How to speed up Puppeteer scraping with parallelization
Free email list ... How to automatically close and cleanup resources ... In the case of Puppeteer, each job opens a new tab...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found