question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug]: HTTPRequests are lost/missing when using Puppeteer with setRequestInterception enabled

See original GitHub issue

Bug description

Here’s the scoop. I’m trying to use Puppeteer v18.0.5 with the bundled chromium browser against a specific website. I’m using Node v16.16.0 However, when I enable request interception via page.setRequestInterception(true), all of the HTTPRequests for any image resources are lost. My handler is invoked far less while intercepting than when not intercepting. The page never fires any requests for images while intercepting. But when I disable the interception, the page loads normally. Yes, I know about invoking continue() on all requests. I’m currently doing that in the request handler on the page.

I’ve also poured over the Puppeteer issues pages and have found similar symptoms on some of the earlier Puppeteer versions, but they were all different issues that have all been resolved since those early versions. This seems unique.

I’ve looked through Puppeteer source code as well as CDP events to try and find any explanation, but have found none.

As an important note for anyone trying to reproduce this, you must be proxied through a server in the London general area in order to successfully load this site. It has geographic restrictions.

Here’s my code to reproduce:

const puppeteer = require(‘puppeteer’);

(async () => {
    const options = {
        browserWidth: 1366,
        browserHeight: 983,
        intercepting: false
    };

    const browser = await puppeteer.launch(
        {
            args: [`--window-size=${options.browserWidth},${options.browserHeight}`],
            defaultViewport: {width: options.browserWidth, height: options.browserHeight},
            headless: false
        }
    );
    const page = (await browser.pages())[0];
    page.on('request', async (request) => {
        console.log(`Request: ${request.method()} | ${request.url()} | ${request.resourceType()} | ${request._requestId}`);
        if (options.intercepting) await request.continue();
    });
    await page.setRequestInterception(options.intercepting);
    await page.goto('https://vegas.williamhill.com', {waitUntil: 'networkidle2', timeout: 65000});

    // To give a moment to view the page in headful mode before closing browser.
    await new Promise(resolve => setTimeout(resolve, 5000));
    await browser.close();
})();

Here’s what the page looks like with intercepting disabled: Expected Page Load

Here’s what the page looks like with intercepting enabled and continuing all requests. Page load while intercepting and continuing all requests

With request interception disabled my handler is invoked for 104 different requests. But with the interception enabled it’s only invoked 22 times. I’m not hitting a navigation timeout as the .goto() method returns before my timeout each time.

Puppeteer version

18.0.5

Node.js version

16.16.0

npm version

8.11.0

What operating system are you seeing the problem on?

macOS

Relevant log output

No response

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:14

github_iconTop GitHub Comments

1reaction
robbimountcommented, Oct 22, 2022

I haven’t done the rollback testing yet. I’ve been a bit busy. But we did do a test where we bypassed the puppeteer interception and went straight to the CDP session provided by the target. We sent Fetch.enable, and registered a handler for the Fetch.requestPaused event. We are then continuing the requests by calling Fetch.continueRequest. When we did this, the page appeared to function correctly. We registered the same number of requests while intercepting compared to when not intercepting.

So, long story short, I’ll do the testing to see if it works in previous Puppeteer versions and get back to you.

1reaction
jrandolfcommented, Oct 7, 2022

Is it possible that interruption are leading to scripts loading in a bad order?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Puppeteer timeout and connect issues - node.js - Stack Overflow
The answer to the error in the case of the first URL is in the error message literally: Execution context was destroyed, most...
Read more >
A common mistake while monitoring HTTP responses with ...
In this blog post, we show how to solve a common bug while monitoring HTTP responses with Puppeteer.
Read more >
Page.setRequestInterception() method - Puppeteer
This provides the capability to modify network requests that are made by a page. Once request interception is enabled, every request will stall...
Read more >
Web Scraping with a Headless Browser: A Puppeteer Tutorial
In this article, Toptal Freelance JavaScript Developer Nick Chikovani shows how easy it is to perform web scraping using a headless browser.
Read more >
Using HTTP proxy with Puppeteer - Medium
Enable request/ response interception using page.setRequestInterception(true) . Intercept request; Make request using Node.js; Return response to Chrome.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found