question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bugfix: UnhandledPromiseRejectionWarning with singlefile attempt

See original GitHub issue

Describe the bug

Decided to move forward with a clean archive, and keep the old one as a historical snapshot. Added a new list of links to archive, but after 6 or 7, almost all of the tasks start erroring out in cascade fashion, starting with singlefile step. Error is TimeoutExpired Command for chromium-browser, but it gives me a command to run, which also errors out with (node:162485) UnhandledPromiseRejectionWarning: SyntaxError: Unexpected number in JSON at position 2

Steps to reproduce

  1. Installed with pip install archivebox
  2. Changed config to disable MEDIA and timeout set to 180 seconds.
  3. Added list of links with archivebox add ./list.txt.
  4. Starts adding things correctly, but after link 5 or 6 starts to error out.

Screenshots or log output

Original error

[i] [2020-09-02 17:41:18] ArchiveBox v0.4.21: archivebox update
    > /mnt/volume/.archivebox-output/new-archive


[*] [2020-09-02 17:41:21] Writing 33 links to main index...
    √ /mnt/volume/.archivebox-output/new-archive/index.sqlite3
    √ /mnt/volume/.archivebox-output/new-archive/index.json
    √ /mnt/volume/.archivebox-output/new-archive/index.html

[▶] [2020-09-02 17:41:23] Collecting content for 8 Snapshots in archive...

[√] [2020-09-02 17:44:28] "NYPD used facial recognition to track down Black Lives Matter activist - The Verge"
    https://theverge.com/2020/8/18/21373316/nypd-facial-recognition-black-lives-matter-activist-derrick-ingram
    √ ./archive/1598996327.937664
      > singlefile
        Extractor failed:
            TimeoutExpired Command '['/mnt/volume/.archivebox-output/new-archive/node_modules/archivebox/node_modules/single-file/cli/single-file', '--browser-executable-path=chromium-browser', '--browser-args="["--headless", "--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36", "--window-size=1440,2000"]"', 'https://theverge.com/2020/8/18/21373316/nypd-facial-recognition-black-lives-matter-activist-derrick-ingram', '/mnt/volume/.archivebox-output/new-archive/archive/1598996327.937664/singlefile.html']' timed out after 180 seconds
        Run to see full output:
            cd /mnt/volume/.archivebox-output/new-archive/archive/1598996327.937664;
            /mnt/volume/.archivebox-output/new-archive/node_modules/archivebox/node_modules/single-file/cli/single-file --browser-executable-path=chromium-browser "--browser-args="["--headless", "--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36", "--window-size=1440,2000"]"" https://theverge.com/2020/8/18/21373316/nypd-facial-recognition-black-lives-matter-activist-derrick-ingram /mnt/volume/.archivebox-output/new-archive/archive/1598996327.937664/singlefile.html

      > pdf
        Extractor failed:
            Exception Failed to chmod: output.pdf does not exist (did the previous step fail?)
        Run to see full output:
            cd /mnt/volume/.archivebox-output/new-archive/archive/1598996327.937664;
            chromium-browser --headless "--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36" --window-size=1440,2000 --timeout=180000 --print-to-pdf https://theverge.com/2020/8/18/21373316/nypd-facial-recognition-black-lives-matter-activist-derrick-ingram

When I run

cd /mnt/volume/.archivebox-output/new-archive/archive/1598996327.937664;
/mnt/volume/.archivebox-output/new-archive/node_modules/archivebox/node_modules/single-file/cli/single-file --browser-executable-path=chromium-browser "--browser-args="["--headless", "--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36", "--window-size=1440,2000"]"" https://theverge.com/2020/8/18/21373316/nypd-facial-recognition-black-lives-matter-activist-derrick-ingram /mnt/volume/.archivebox-output/new-archive/archive/1598996327.937664/singlefile.html

Output:

(node:162485) UnhandledPromiseRejectionWarning: SyntaxError: Unexpected number in JSON at position 2
    at JSON.parse (<anonymous>)
    at getBrowserOptions (/mnt/volume/.archivebox-output/new-archive/node_modules/archivebox/node_modules/single-file/cli/back-ends/puppeteer.js:63:51)
    at Object.exports.initialize (/mnt/volume/.archivebox-output/new-archive/node_modules/archivebox/node_modules/single-file/cli/back-ends/puppeteer.js:36:35)
    at initialize (/mnt/volume/.archivebox-output/new-archive/node_modules/archivebox/node_modules/single-file/cli/single-file-cli-api.js:46:16)
    at run (/mnt/volume/.archivebox-output/new-archive/node_modules/archivebox/node_modules/single-file/cli/single-file:33:59)
    at Object.<anonymous> (/mnt/volume/.archivebox-output/new-archive/node_modules/archivebox/node_modules/single-file/cli/single-file:30:1)
    at Module._compile (internal/modules/cjs/loader.js:778:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:789:10)
    at Module.load (internal/modules/cjs/loader.js:653:32)
    at tryModuleLoad (internal/modules/cjs/loader.js:593:12)
(node:162485) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 2)
(node:162485) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

Software versions

  • OS: Ubuntu 20.04.1 LTS
  • ArchiveBox version: 0.21
  • Python version: Python 3.7.9
  • Chrome version: Chromium 84.0.4147.105

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:9 (4 by maintainers)

github_iconTop GitHub Comments

5reactions
drpfendersoncommented, Sep 4, 2020

Well, I essentially did a find/replace/delete for every archivebox/node/python file on my system that could be related. There were a number of weird places with files due to the various installation methods I’ve used over the years for this program. Realizing the amount of work to disentangle everything, I spun up a new server, attached the archive to it, wiped all the local config/conf files, ran the docker-compose run archivebox add and it all worked!

Thank you all for your infinite patience with me.

3reactions
gildas-lormeaucommented, Sep 4, 2020

Finally, I was able to fix the issue by formatting the --browser-args switch like this (surrounding quotes included) in your example:

"--browser-args=""[""--headless"", ""--no-sandbox"", ""--disable-gpu"", ""--disable-dev-shm-usage"", ""--disable-software-rasterizer"", ""--window-size=1440,2000"", ""--user-agent"="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36""]"""

instead of: "--browser-args="["--headless", "--no-sandbox", "--disable-gpu", "--disable-dev-shm-usage", "--disable-software-rasterizer", "--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36", "--window-size=1440,2000"]""

Read more comments on GitHub >

github_iconTop Results From Across the Web

eslint/eslint - Gitter
When i try to insert my value to the postgresql db, I cant able to push my data, I got an 'UnhandledPromiseRejectionWarning: Error:...
Read more >
rollup - UNPKG
38, - Fix missing parameter defaults for calls from try statements and ... fix: prevent UnhandledPromiseRejectionWarning when module resolution/parsing ...
Read more >
gildas-lormeau/single-file-cli - GitHub
Troubleshooting. If the error message UnhandledPromiseRejectionWarning: Error: Browser is not downloaded. Run "npm install" or "yarn install" at ChromeLauncher ...
Read more >
mongoose query and Code Example - Code Grepper
How do I pull a native DOM element from a jQuery object? jquery get native element · Math max with array js ·...
Read more >
BB Node - Cookbook.discover - Solutio | PDF - Scribd
2 – Stable: With stable APIs, the Node.js project will try to ensure compatibility. ... For this recipe, let's first create a single...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found