question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

wget Errors on latest master

See original GitHub issue

Describe the bug

wget times out after 30 seconds on the latest build of master branch. When same wget command is run outside of ArchiveBox wget works as expected

Steps to reproduce

Steps to reproduce the behavior:

  1. Use the following .ArchiveBox.config options
# Example config file for ArchiveBox: The self-hosted internet archive.
# Copy this file to ~/.ArchiveBox.conf before editing it.
# Config file is in both Python and .env syntax (all strings must be quoted).
# For documentation, see:
#    https://github.com/pirate/ArchiveBox/wiki/Configuration

################################################################################
## General Settings
################################################################################
OUTPUT_PERMISSIONS=644
ONLY_NEW=True
TIMEOUT=30
MEDIA_TIMEOUT=3600
#TEMPLATES_DIR="archivebox/templates"
FOOTER_INFO="Content is hosted for personal archiving purposes only. Contact server owner for any takedown requests."
FETCH_TITLE=True
FETCH_FAVICON=True
FETCH_WGET=True
FETCH_WARC=True
FETCH_PDF=True
FETCH_SCREENSHOT=False
FETCH_DOM=True
FETCH_GIT=True
FETCH_MEDIA=True
SUBMIT_ARCHIVE_DOT_ORG=False
#CHECK_SSL_VALIDITY=True
FETCH_WGET_REQUISITES=True
RESOLUTION="1440,900"
WGET_USER_AGENT="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"
HEADLESS_USER_AGENT="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36"
GIT_DOMAINS="github.com,bitbucket.org,gitlab.com"
#COOKIES_FILE="path/to/cookies.txt"
#CHROME_USER_DATA_DIR="~/.config/google-chrome/Default"
USE_COLOR=false
SHOW_PROGRESS=false
  1. Run ./archive `echo “https://developer.apple.com/library/archive/technotes/tn2218/_index.html#//apple_ref/doc/uid/DTS40007625” | ./archive

  2. See error

Screenshots or log output

wget Failed:TimeoutExpired Command ‘/usr/local/bin/wget’ timed out after 30 seconds Run to see full output: cd /Volumes/home/www/archive/1553194400.182; /usr/local/bin/wget --no-verbose --adjust-extension --convert-links --force-directories --backup-converted --span-hosts --no-parent -e robots=off --restrict-file-names=unix --timeout=30 --warc-file=warc/1553194992 --page-requisites “–user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36” https://developer.apple.com/library/archive/technotes/tn2219/_index.html#//apple_ref/doc/uid/DTS10004624

Software versions

(please complete the following information)

  • OS: macOS 10.14
  • ArchiveBox version: d798117
  • Python version: Python 3.7.2
  • Wget version: GNU Wget 1.19.5 built on darwin17.5.0.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
n0ncetoniccommented, Mar 21, 2019

Update: So I checked my settings and it looks like my NAS was mounting using SMB 2 by default. I’ve since changed this to SMB 3 which should help with any disk I/O issues resulting from network latency.

I’m running ArchiveBox again on the same data set as before and the issue seems to be resolved with an average archive time of 15 seconds per link which is back to a fairly decent speed.

1reaction
n0ncetoniccommented, Mar 21, 2019

Somehow appears to have resolved itself although wget does appear to have been severely slowed down by something in the commits between c79e1df and d798117 and I’m getting throughput of 1 url archived every 30 or so seconds

Read more comments on GitHub >

github_iconTop Results From Across the Web

Wget command does not download the github repository, "Not ...
It downloads something called 'kernel' and then when I do 'cd kernel' I get "-bash: cd: kernel: Not a directory." Why is wget...
Read more >
Latest `wget` 1.19.3 produces errors · Issue #2052 - GitHub
I'm getting this problem on fresh Termux installation with wget 1.19.3 but this problem exists even on desktop linux installation (ArchLinux, ...
Read more >
Wget Resume Broken Download - nixCraft
This page explains how to use the wget command to resume broken download feature for getting a partially downloaded file on ...
Read more >
GNU Wget 1.21.1-dirty Manual
1 Overview. GNU Wget is a free utility for non-interactive download of files from the Web. It supports HTTP, HTTPS, and FTP protocols,...
Read more >
wget a zip file from GitHub, OpenSSL error [closed]
wget -O sindresorhus-is-online-master.zip ... OpenSSL: error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found