question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

page content incorrectly embedded when using file protocol

See original GitHub issue

Hi again, As usual, thanks a lot for maintaining puppeteer - it’s super useful. 👍

Seems to me that there was a regression in 0.13 🐛 when using the combination of:

  • file:/// protocol
  • setRequestInterception: true

Steps to reproduce

Tell us about your environment:

  • Puppeteer version: 0.13.0
  • Platform / OS version: MacOS High Sierra, same on windows
  • URLs (if applicable): Any url over the file protocol seems to have this problem.

What steps will reproduce the problem?

Please include code that reproduces the issue.

const puppeteer = require('puppeteer');
(async() => {
const browser = await puppeteer.launch({
      // waitUntil: 'networkidle2', // see if this performs better than default for timeouts
      ignoreHTTPSErrors: true,
      args: ['--disable-setuid-sandbox', '--no-sandbox']
    })
const page = await browser.newPage()

// CRITICAL: _only_ with setRequestInterception = true do we get this problem:
// the real html is escaped and embedded in a template like below.
// <html><head></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">
// <REAL CONTENT HERE>
// </pre></body></html>
await page.setRequestInterception(true)
page.on('request', interceptedRequest => {
  console.log(interceptedRequest.url)
  interceptedRequest.continue()
})

// ANY local html file will do
await page.goto('file:///Users/pocketjoso/penthouse/test/static-server/yeoman.html')

const content = await page.content()
console.log('content: ' + content)

browser.close();
})();

What is the expected result? The html is parsed in the same way regardless of file/http protocol, and regardless of whether setRequestInterception is true or false.

What happens instead?

content: <html><head></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">&lt;!doctype html&gt;
&lt;html class="no-js"&gt;
    &lt;head&gt;
        &lt;meta charset="utf-8"&gt;
        &lt;title&gt;critical css test&lt;/title&gt;
        &lt;meta name="description" content=""&gt;
        &lt;meta name="viewport" content="width=device-width, initial-scale=1"&gt;
        &lt;!-- Place favicon.ico and apple-touch-icon.png in the root directory --&gt;
... etc

Or in a picture (the browser renders the markup as text, rather than as html): bjor6laez-after

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:1
  • Comments:13 (5 by maintainers)

github_iconTop GitHub Comments

5reactions
aslushnikovcommented, Apr 17, 2018

For the record: upstream fix for this has landed as https://crrev.com/550319 This will be fixed with the upcoming chromium roll.

2reactions
pocketjosocommented, Apr 17, 2018

That’s great! 👏 Any insight into how it fell out of Chromium between the different versions, just a regression of some sort?

If possible please ping here on the next Puppeteer release where Chrome is bumped, then we can hopefully close this and move on (move everyone to latest Puppeteer) 💯

Read more comments on GitHub >

github_iconTop Results From Across the Web

Restrictions on File Urls - text/plain
To workaround the limitation, you can either embed your XSL in the XML file as a data URL: …or you can transform the...
Read more >
How to fix a website with blocked mixed content - Web security
The best strategy to avoid mixed content blocking is to serve all the content as HTTPS instead of HTTP. For your own domain,...
Read more >
How to embed external SWF from file:// protocol
I'm trying to embed an external .swf file from an HTA application that consists of HTML content, javascript and vbscript. I load the...
Read more >
Declaring character encodings in HTML
You should always specify the encoding used for an HTML or XML page. If you don't, you risk that characters in your content...
Read more >
RFC 7807: Problem Details for HTTP APIs
When serialized as a JSON document, that format is identified with the "application/problem+json" media type. Nottingham & Wilde Standards Track [Page 3] ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found