Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

FR: Use custom cookies for PDF and screenshot generation

See original GitHub issue

Type

General Question or Disussion
Propose a brand new feature
Request modification of existing behavior or design

What is the problem that your feature request solves

I am archiving some websites with 18+ confirmation (mildly adult contents) and it has a entering confirmation to verify your age. I have exported the cookies.txt file and linked to the configuration and the Local Archive has passed the confirmation successfully; however, I noticed that other types of archive (e.g. HTML, PDF and screenshot) are not applied to the cookie file in my configuration file so everything it captured was just an 18+ confirmation screen.

Describe the ideal specific solution you’d want, and whether it fits into any broader scope of changes

FYI this is the web page I tried to archive (and other posts under this sub forum). Hope that the archiving process includes the custom cookie file I provided in the configuration and print out correctly for the PDF, HTML and screenshot archive.

What hacks or alternative solutions have you tried to solve the problem?

I’ve looked into each index.json file for each archive and found out that it would be nice to include cookies flag for the headless chrome/chromium command. Only wget has been assigned with my cookie file.

How badly do you want this new feature?

It’s an urgent deal-breaker, I cant live without it
It’s important to add it in the near-mid term future
It would be nice to have eventually

I’m willing to contribute to development / fixing this issue
I like ArchiveBox so far / would recommend it to a friend

P.S. I don’t have code experience and excuse me for the lacking knowledge of IT knowledge.

Issue Analytics

State:
Created 4 years ago
Comments:11 (6 by maintainers)

Top GitHub Comments

1reaction

piratecommented, Aug 13, 2019

The trick is your Chrome data dir used for archiving needs to be from a Chrome instance that’s logged into the site. Try opening chromium-browser or whatever binary is the same chrome instance you’re using for archiving, and logging into the site, then running archivebox. If you’re doing it on a remote server you’ll need to rsync your chrome data dir to the server.

You can find it at one of these paths depending on what OS you’re on and what Chrome version you’re using:

            # if using chromium
            '~/.config/chromium',                      # linux
            '~/Library/Application Support/Chromium',   # mac
            '~/AppData/Local/Chromium/User Data',   # windows

            # if using normal Google Chrome
            '~/.config/chrome',
            '~/.config/google-chrome',
            '~/Library/Application Support/Google/Chrome',
            '~/AppData/Local/Google/Chrome/User Data',
            '~/.config/google-chrome-stable',

           # If using beta/canary chrome
            '~/.config/google-chrome-beta',
            '~/Library/Application Support/Google/Chrome Canary',
            '~/AppData/Local/Google/Chrome SxS/User Data',
            '~/.config/google-chrome-unstable',
            '~/.config/google-chrome-dev',

0reactions

piratecommented, Mar 23, 2022

Are you using the same Chromium version inside and outside Docker to generate that profile? It must be exactly the same version, architecture, release type, etc. for it to work. @terxw You can try setting CHROME_HEADLESS=False and checking the GUI that pops up to make sure it’s using it correctly.

Top Results From Across the Web

FR: Use custom cookies for PDF and screenshot generation ...

The trick is your Chrome data dir used for archiving needs to be from a Chrome instance that's logged into the site.

Generate & send PDFs from Google Sheets | Apps Script

Automatically create PDFs with information from sheets in a Google Sheets spreadsheet. Once the PDFs are generated, you can email them out directly...

Capture a website screenshot online. / Dataflow kit

The simplest solution to get an array of cookies for specific websites is to use a web browser and EditThisCookie extension. Copy a...

iText DITO® | iText PDF

Together you have a template solution capable of generating a few hundred to a few hundred thousand PDF documents per day. iText DITO...

Welcome to Report and Run

Generate and send custom PDF reports complete with photos, textboxes and drawings all from your phone. This tool will help with your next...