Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Attempt file downloading via direct navigation triggers error in Puppeteer

See original GitHub issue

1. Environment

Puppeteer version: puppeteer@5.5.0
Platform / OS version: Arch Linux at kernel version 5.10.3-arch1-1, x86_64
URLs (if applicable): See example below
Node.js version: v15.5.0

2. Quick Description

It’s known that Chromium in its headful mode is capable of downloading files by using the Browser.setDownloadBehavior CDP method.

It is observed that (see examples below) the current version of Puppeteer (5.5.0) fails in downloading the files:

When the file-downloading URL is provided directly for navigation

while succeeds:

When the same file-downloading URL is visited via an in-page action of clicking on the link
Or when sending the Page.navigate CDP method directly than using Puppeteer’s page.goto

(in all three cases it is the same URL that is being downloaded)

3. A sample script for illustration

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({
    headless: false,
    userDataDir: '/tmp/tmp_chrome_profile',
  });
  const page = await browser.newPage();
  await page._client.send('Browser.setDownloadBehavior', {
    behavior: 'allow',
    downloadPath: '/tmp/tmpdown/',
  });
  let downloadState = null;
  await page._client.on('Page.downloadProgress', ({guid, totalBytes, receivedBytes, state}) => {
    downloadState = state;
  });
  const testType = process.env.TEST_TYPE || '1';
  if (testType == '1') {
    // direct download
    await page.goto('https://coronavirus.data.gov.uk/downloads/easy_read/overview/United%20Kingdom.docx');
  } else if (testType == '2') {
    // indirect download -- navigate to the index page then download via a click
    await page.goto('https://coronavirus.data.gov.uk/');
    await page.waitForSelector('div.govuk-grid-row a[href="/downloads/easy_read/overview/United Kingdom.docx"]');
    await page.click('div.govuk-grid-row a[href="/downloads/easy_read/overview/United Kingdom.docx"]');
  } else if (testType == '3') {
    // no Puppeteer navigation -- use CDP directly instead
    await page._client.send('Page.navigate', {
      url: 'https://coronavirus.data.gov.uk/downloads/easy_read/overview/United%20Kingdom.docx',
    })
  }
  async function checkResult() {
    if (downloadState == 'completed') {
      console.log('download completed');
      await browser.close();
    } else {
      setTimeout(checkResult, 100);
    }
  }
  checkResult();
})();

4. Sample results from the above script

Case 1: When the file-downloading URL is provided directly for navigation – fails within Puppeteer

$ rm -rf /tmp/tmpdown/ && TEST_TYPE=1 node test.js && ls /tmp/tmpdown/
/tmp/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:115
                    ? new Error(`${response.errorText} at ${url}`)
                      ^

Error: net::ERR_ABORTED at https://coronavirus.data.gov.uk/downloads/easy_read/overview/United%20Kingdom.docx
    at navigate (/tmp/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:115:23)
    at processTicksAndRejections (node:internal/process/task_queues:93:5)
    at async FrameManager.navigateFrame (/tmp/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:90:21)
    at async Frame.goto (/tmp/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:417:16)
    at async Page.goto (/tmp/node_modules/puppeteer/lib/cjs/puppeteer/common/Page.js:784:16)
    at async /tmp/test.js:21:5

Case 2: When the same file-downloading URL is visited via an in-page action of clicking on the link – succeeds

$ rm -rf /tmp/tmpdown/ && TEST_TYPE=2 node test.js && ls /tmp/tmpdown/
download completed
'United Kingdom.docx'

Case 3: When sending the Page.navigate CDP method directly than using Puppeteer’s page.goto – succeeds

$ rm -rf /tmp/tmpdown/ && TEST_TYPE=3 node test.js && ls /tmp/tmpdown/
download completed
'United Kingdom.docx'

Edited: It seems that in this case 3 example, the navigation request from the CDP method also gives an error. It is just that such error won’t affect the actual file downloading task. Given below is a sample response from the Page.nagivate CDP method:

{"frameId":"8F5624D4BE6BE24F40AB845DD85F616E","loaderId":"312751DE4B880C95ED2A7CFDED219238","errorText":"net::ERR_ABORTED"}

5. Misc

The above sample script picks a random URL for illustration purposes. Assumption is that it is not the hard-coded URL that causes the observed issue, since the same issue can be observed as well in many other URLs.

I understand that such feature (file downloading) might be yet recognized as unsupported in Puppeteer. However it would really be nice if we could somehow get rid of the error.

Issue Analytics

State:
Created 3 years ago
Reactions:7
Comments:6

Top GitHub Comments

3reactions

andormadecommented, Dec 4, 2021

To anyone who found this issue, this is the workaround that I’m using:

async function goto(page, link) {
    return page.evaluate((link) => {
        location.href = link;
    }, link);
}

goto(page, link);

0reactions

stale[bot]commented, Jul 23, 2022

We are closing this issue. If the issue still persists in the latest version of Puppeteer, please reopen the issue and update the description. We will try our best to accomodate it!