Only get domain redirects and possible bug found
See original GitHub issueSteps to reproduce
- Puppeteer version: 0.13.0
What steps will reproduce the problem?
I’m trying to get all domain redirects using the Puppeteer api, saved to an array before taking a screenshot of the final URL, but the code I have so far is also getting other redirects.
For example if I goto youtube.com
then my code will correctly get the redirects ‘https://youtube.com/’, ‘https://www.youtube.com/’, but it will also get other redirects such as doubleclick.net.
I only want to get the redirects which would happen in the URL bar.
I’ve managed to narrow it down with request.resourceType === ‘document’. How can I narrow it down further??
Here’s the code:
// node chrome.js http://youtube.com
const puppeteer = require('puppeteer');
var url = process.argv[2];
(async () => {
const browser = await puppeteer.launch({headless: true, timeout: 30000, ignoreHTTPSErrors: true});
const page = await browser.newPage();
// await page.setRequestInterception(true); // hangs with resourcetype
const urls = [];
page.on('request', request => {
// if (request.resourceType === 'document' || request.resourceType === 'script') {
if (request.resourceType === 'document') {
urls.push(request.url);
request.continue();
}
});
await page.goto(url, {timeout: 20000, waitUntil: 'load'}); //default load
await page.screenshot({path: 'test.jpg', type: 'jpeg', quality: 80, fullPage: false});
console.log(urls);
await browser.close();
})();
I’ve also found what I believe to be a bug where using await page.setRequestInterception(true);
with request.resourceType === 'document
causes the script to hang forever (untill timeout). The bug is apparent using the above script.
Issue Analytics
- State:
- Created 6 years ago
- Comments:6 (3 by maintainers)
Top GitHub Comments
@aslushnikov Thanks 😃 That would be incredibly useful to me as I’ve hit a dead-end until I can get all redirects from both client and server side.
I’ve been fiddling around with the
framenavigated
event, to get client-side redirects and using theresponse
event for server-side redirects, but once there’s a client-side redirect theresponse
event obviously stops following and so doesn’t register any server-side redirects that may come after the client redirect.Will this thread be updated/closed once the new feature is available?
For anyone interested you can get client-side redirects like this:
@leem32 all the future updates will take place in the #1579, please subscribe to be notified on future changes