[Feature request] ability to set localStorage/sessionStorage w/o loading a page from target domain
See original GitHub issueWe run Puppeteer on AWS Lambda to orchestrate multi-step, multi-page crawl sessions on our SaaS Fluxguard (e.g., login, go to dashboard, go to page C). Due to time constraints on Lambda, and other reasons, each page is handled by its own Lambda execution in sequence. We save all browser state (cookies, localStorage, webStorage) in an object store for reuse by subsequent page crawls.
The problem arises when we want to re-use saved local/sessionStorage on subsequent crawls. We cannot set local or session storage w/o first loading a page from the target site via, e.g.:
export const setLocalStorage = async (chromePage, newStorage = {}) =>
await chromePage.evaluate(newStorage => {
localStorage.clear();
for (let key in newStorage) {
localStorage.setItem(key, newStorage[key]);
}
}, newStorage);
We initially loaded the target page twice: first so that we could set storage, and second, once storage was set, to properly load the page w/ appropriate state. However, this is troublesome, as the first load, regardless of whether we disable Javascript/etc, will often pollute the cookie/storage space with new data. It’s also messy to have to load the page twice.
Currently, we try/catch loading “innocuous” pages of the target site, such as robots.txt
and favicon.ico
: we use these then to contextually set storage before loading the target page. This is “fine,” but not ideal and introduces its own problems.
It would be great if Puppeteer could set storage generally or for a specific domain without the need to load a page from that domain first.
Issue Analytics
- State:
- Created 5 years ago
- Comments:7 (3 by maintainers)
Top GitHub Comments
@aslushnikov great recommendation! This approach is working nicely for us. And as you note it’s a lot more bulletproof than hitting
favicon.ico
or whatever instead. Closing!Thanks… we will try this approach and report back in this issue!