question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Attempt file downloading via direct navigation triggers error in Puppeteer

See original GitHub issue

1. Environment

  • Puppeteer version: puppeteer@5.5.0
  • Platform / OS version: Arch Linux at kernel version 5.10.3-arch1-1, x86_64
  • URLs (if applicable): See example below
  • Node.js version: v15.5.0

2. Quick Description

It’s known that Chromium in its headful mode is capable of downloading files by using the Browser.setDownloadBehavior CDP method.

It is observed that (see examples below) the current version of Puppeteer (5.5.0) fails in downloading the files:

  • When the file-downloading URL is provided directly for navigation

while succeeds:

(in all three cases it is the same URL that is being downloaded)

3. A sample script for illustration

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({
    headless: false,
    userDataDir: '/tmp/tmp_chrome_profile',
  });
  const page = await browser.newPage();
  await page._client.send('Browser.setDownloadBehavior', {
    behavior: 'allow',
    downloadPath: '/tmp/tmpdown/',
  });
  let downloadState = null;
  await page._client.on('Page.downloadProgress', ({guid, totalBytes, receivedBytes, state}) => {
    downloadState = state;
  });
  const testType = process.env.TEST_TYPE || '1';
  if (testType == '1') {
    // direct download
    await page.goto('https://coronavirus.data.gov.uk/downloads/easy_read/overview/United%20Kingdom.docx');
  } else if (testType == '2') {
    // indirect download -- navigate to the index page then download via a click
    await page.goto('https://coronavirus.data.gov.uk/');
    await page.waitForSelector('div.govuk-grid-row a[href="/downloads/easy_read/overview/United Kingdom.docx"]');
    await page.click('div.govuk-grid-row a[href="/downloads/easy_read/overview/United Kingdom.docx"]');
  } else if (testType == '3') {
    // no Puppeteer navigation -- use CDP directly instead
    await page._client.send('Page.navigate', {
      url: 'https://coronavirus.data.gov.uk/downloads/easy_read/overview/United%20Kingdom.docx',
    })
  }
  async function checkResult() {
    if (downloadState == 'completed') {
      console.log('download completed');
      await browser.close();
    } else {
      setTimeout(checkResult, 100);
    }
  }
  checkResult();
})();

4. Sample results from the above script

Case 1: When the file-downloading URL is provided directly for navigation – fails within Puppeteer

$ rm -rf /tmp/tmpdown/ && TEST_TYPE=1 node test.js && ls /tmp/tmpdown/
/tmp/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:115
                    ? new Error(`${response.errorText} at ${url}`)
                      ^

Error: net::ERR_ABORTED at https://coronavirus.data.gov.uk/downloads/easy_read/overview/United%20Kingdom.docx
    at navigate (/tmp/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:115:23)
    at processTicksAndRejections (node:internal/process/task_queues:93:5)
    at async FrameManager.navigateFrame (/tmp/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:90:21)
    at async Frame.goto (/tmp/node_modules/puppeteer/lib/cjs/puppeteer/common/FrameManager.js:417:16)
    at async Page.goto (/tmp/node_modules/puppeteer/lib/cjs/puppeteer/common/Page.js:784:16)
    at async /tmp/test.js:21:5

Case 2: When the same file-downloading URL is visited via an in-page action of clicking on the link – succeeds

$ rm -rf /tmp/tmpdown/ && TEST_TYPE=2 node test.js && ls /tmp/tmpdown/
download completed
'United Kingdom.docx'

Case 3: When sending the Page.navigate CDP method directly than using Puppeteer’s page.goto – succeeds

$ rm -rf /tmp/tmpdown/ && TEST_TYPE=3 node test.js && ls /tmp/tmpdown/
download completed
'United Kingdom.docx'

Edited: It seems that in this case 3 example, the navigation request from the CDP method also gives an error. It is just that such error won’t affect the actual file downloading task. Given below is a sample response from the Page.nagivate CDP method:

{"frameId":"8F5624D4BE6BE24F40AB845DD85F616E","loaderId":"312751DE4B880C95ED2A7CFDED219238","errorText":"net::ERR_ABORTED"}

5. Misc

The above sample script picks a random URL for illustration purposes. Assumption is that it is not the hard-coded URL that causes the observed issue, since the same issue can be observed as well in many other URLs.

I understand that such feature (file downloading) might be yet recognized as unsupported in Puppeteer. However it would really be nice if we could somehow get rid of the error.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:7
  • Comments:6

github_iconTop GitHub Comments

3reactions
andormadecommented, Dec 4, 2021

To anyone who found this issue, this is the workaround that I’m using:

async function goto(page, link) {
    return page.evaluate((link) => {
        location.href = link;
    }, link);
}

goto(page, link);
0reactions
stale[bot]commented, Jul 23, 2022

We are closing this issue. If the issue still persists in the latest version of Puppeteer, please reopen the issue and update the description. We will try our best to accomodate it!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Chrome download error when downloading file with Puppeteer
Problem is that the CSV is downloaded empty and with an error. This happens both with headless true and false. The page finished...
Read more >
When to use Puppeteer Scraper - Help · Apify
Sadly, navigation (going to a different URL) destroys pages, so whenever you click a button in Web Scraper that forces the browser to...
Read more >
Navigating & waiting - Checkly
In your scripts you can click on a link that triggers a navigation to a new page. You can use Puppeteer's page.waitForNavigation() method...
Read more >
Dealing with file downloads in puppeteer - browserless docs
Because of this complex nature we offer two mechanisms in which to deal with file downloads in browserless: using our /download API, or...
Read more >
How To Build a Concurrent Web Scraper with Puppeteer ...
Navigate to the server folder: cd ../server. Create and open the puppeteerManager.js file using your preferred ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found