question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`setRequestInterception` does not prevent navigation

See original GitHub issue

Steps to reproduce

Tell us about your environment:

  • Puppeteer version: v1.13.0
  • Platform / OS version: MacOS 10.14.3
  • URLs (if applicable): x
  • Node.js version: 11.12

What steps will reproduce the problem?

I am unsure whether this is suppose to be possible or not therefore I’m going to skip the repro until we can confirm this is a bug.

I’m trying to use setRequestInterception(true) to ‘pause’ navigation requests in order to gather information from other failed requests (fetch etc.) and to wait for screenshots (on ever navigation) to be taken and saved.

I am/was under the impression that setRequestInterception(true) would prevent the browser from initiating the page load. But it seems the browser merely gets stuck doing so, which causes other introspection (response.buffer() etc) requests to fail (Protocol error (Network.getResponseBody): No resource with given identifier found) as the browser appears to no longer be aware of these requests.

Is there any way to effectively halt any browser navigation (including page.goto) before it unloads its request information?

The relevant code is here:

import { HttpMethod, Page, Request } from 'puppeteer';
import { BrowserLogEntry } from '../setup';
import { StreamBuffer } from './StreamBuffer';

export interface PageRequest {
  readonly url: string;
  readonly method: HttpMethod;
  readonly headers: {
    [key: string]: string;
  };
  readonly data?: string | object;
}

export interface PageResponse {
  readonly statusCode: number;
  readonly headers: {
    [key: string]: string;
  };
  readonly data?: string | object;
}

export class PageRequestError extends Error {
  constructor(
    message: string,
    public readonly request: PageRequest,
    public readonly response?: PageResponse,
  ) {
      super(message);
    }
}

function tryJSONParse(data?: string): undefined | string | object {
  if (!data) {
    return data;
  }
  try {
    return <object>JSON.parse(data);
  } catch {
    return data;
  }
}

function fetchRequestData(request: Request): PageRequest {
  return {
    data: tryJSONParse(request.postData()),
    headers: request.headers(),
    method: request.method(),
    url: request.url(),
  };
}

async function makePageRequestError(request: Request, blocking: boolean): Promise<PageRequestError> {
  const failure = request.failure();
  const response = request.response();
  const reqData = fetchRequestData(request);
  if (failure && failure.errorText.startsWith('net::')) {
    return new PageRequestError(`Request failed with ${failure.errorText}`, reqData);
  }
  return new PageRequestError(
    failure ? failure.errorText : response ? response.status().toString() : 'No Response',
    reqData,
    response ? {
        data: blocking ? tryJSONParse((await response.buffer()).toString()) : 'UNAVAILABLE',
        headers: response.headers(),
        statusCode: response.status(),
    } : undefined);
}

/**
 * Orders a halt of main page navigation.
 */
export interface NavigationInterruptor {
  preventNavigation(source: string): () => void;
}

/**
 * This class is designed to prevent all direct browser navigation until all subsequent requests
 * have completed.
 *
 */
export class NavigationInterruptor implements NavigationInterruptor {
  private locks = new Set<{ source: string } | Request>();
  constructor(private blocking: boolean, private page: Page, private logBuffer: StreamBuffer<BrowserLogEntry>) {

  }

  public async attach() {
    const { blocking, locks, logBuffer, page } = this;
    if (this.blocking) {
      await page.setRequestInterception(true);
      page.on('request', async request => {
        let continued = false;
        try {
          if (request.isNavigationRequest()) {
            await this.hold(request);
            return;
          }
          locks.add(request);
          await request.continue();
          continued = true;
        } catch (error) {
          logBuffer.write({
            error: <Error>error,
            source: 'navblock',
          });
          await request.continue();
          if (!continued) {
            await request.continue();
          }
        }
      });
    }

    page.on('requestfinished', async request => {
      try {
        const failure = request.failure();
        const response = request.response();
        const statusCode = response && response.status();
        if (!failure && (statusCode && ((statusCode >= 200 && statusCode < 400)) || statusCode === 0)) {
          return;
        }
        logBuffer.write({
          error: await makePageRequestError(request, blocking),
          source: 'requestfinished',
        });
      } catch (error) {
        logBuffer.write({
          error: <Error>error,
          source: 'requestfinished',
        });
      } finally {
        this.finishRequest(request);
      }
    });

    page.on('requestfailed', async request => {
      try {
        logBuffer.write({
          error: await makePageRequestError(request, blocking),
          source: 'requestfailed',
        });
      } catch (error) {
        logBuffer.write({
          error: <Error>error,
          source: 'requestfailed',
        });
      } finally {
        this.finishRequest(request);
      }
    });
  }

  public preventNavigation(source: string): () => void {
    const lock = {
      source,
    };
    this.locks.add(lock);
    return () => {
      this.locks.delete(lock);
    };
  }

  public isHolding(): boolean {
    return this.locks.size > 0;
  }

  private async hold(request: Request, holdTime = Date.now()) {
    const { locks, logBuffer, page } = this;
    if (locks.size === 0) {
      await request.continue();
      return;
    }
    const diff = Date.now() - holdTime;
    if (diff > 5000) {
      locks.clear();
      logBuffer.write({
        error: new Error('Blocked navigation for more than 2000ms'),
        source: 'navblock',
      });
      await request.continue();
      return;
    }
    await page.waitFor(100);
    await this.hold(request, holdTime);
  }

  private finishRequest(request: Request) {
    if (!this.blocking || request.isNavigationRequest()) {
      return;
    }
    this.locks.delete(request);
  }
}

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:12 (3 by maintainers)

github_iconTop GitHub Comments

4reactions
SimonSchickcommented, Feb 3, 2020

Minor bump, this is still an issue.

2reactions
vbisbestcommented, Apr 8, 2019

I believe this is affecting me as well. I am trying to scrape JS links by clicking on elements and capturing the requestCreated to grab the URL. When I abort, it returns an error page to chromium and I can no longer continue to click on elements. I wish there was a way to abort and cancel the navigate action all together.

Read more comments on GitHub >

github_iconTop Results From Across the Web

node.js - page.setRequestInterceptionEnabled(true) prevents ...
Left sidebar with listings does not load when I use page.setRequestInterception(true);. Am I missing something here? I've tried hundreds of ...
Read more >
Puppeteer documentation - DevDocs
Puppeteer is a Node library which provides a high-level API to control Chromium or Chrome over the DevTools Protocol. The Puppeteer API is...
Read more >
Request class - puppeteer library - Dart API - Pub.dev
Fulfills request with given response. To use this, request interception should be enabled with page.setRequestInterception . Exception is thrown if request ...
Read more >
Puppeteer | Puppeteer
Puppeteer is a Node.js library which provides a high-level API to control Chrome/Chromium over the DevTools Protocol. Puppeteer runs in headless mode by ......
Read more >
How to disable images and CSS in Puppeteer to speed up ...
The general idea is to not let the headless browser run any command that doesn't help with the scraping. This includes loading images,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found