FR: Use custom cookies for PDF and screenshot generation
See original GitHub issueType
- General Question or Disussion
- Propose a brand new feature
- Request modification of existing behavior or design
What is the problem that your feature request solves
I am archiving some websites with 18+ confirmation (mildly adult contents) and it has a entering confirmation to verify your age. I have exported the cookies.txt
file and linked to the configuration and the Local Archive has passed the confirmation successfully; however, I noticed that other types of archive (e.g. HTML, PDF and screenshot) are not applied to the cookie file in my configuration file so everything it captured was just an 18+ confirmation screen.
Describe the ideal specific solution you’d want, and whether it fits into any broader scope of changes
FYI this is the web page I tried to archive (and other posts under this sub forum). Hope that the archiving process includes the custom cookie file I provided in the configuration and print out correctly for the PDF, HTML and screenshot archive.
What hacks or alternative solutions have you tried to solve the problem?
I’ve looked into each index.json
file for each archive and found out that it would be nice to include cookies flag for the headless chrome/chromium command. Only wget has been assigned with my cookie file.
How badly do you want this new feature?
- It’s an urgent deal-breaker, I cant live without it
- It’s important to add it in the near-mid term future
- It would be nice to have eventually
- I’m willing to contribute to development / fixing this issue
- I like ArchiveBox so far / would recommend it to a friend
P.S. I don’t have code experience and excuse me for the lacking knowledge of IT knowledge.
Issue Analytics
- State:
- Created 4 years ago
- Comments:11 (6 by maintainers)
Top GitHub Comments
The trick is your Chrome data dir used for archiving needs to be from a Chrome instance that’s logged into the site. Try opening
chromium-browser
or whatever binary is the same chrome instance you’re using for archiving, and logging into the site, then running archivebox. If you’re doing it on a remote server you’ll need to rsync your chrome data dir to the server.You can find it at one of these paths depending on what OS you’re on and what Chrome version you’re using:
Are you using the same Chromium version inside and outside Docker to generate that profile? It must be exactly the same version, architecture, release type, etc. for it to work. @terxw You can try setting
CHROME_HEADLESS=False
and checking the GUI that pops up to make sure it’s using it correctly.