question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Question: archivebox server throws 500 error

See original GitHub issue

I’m using nginx in front of archivebox on my local intranet to access my archives, and it’s throwing a 500 error whenever I simply load the front page. nginx is recording no errors, so I assume this is happening internal to archivebox. The archivebox server console – running in a screen – simply reports the 500 error and nothing further.

Is there anything I can do to troubleshoot this? Perhaps increase the verbosity of archivebox server at the console so I can see where the fault exists?

tried an archivebox init on my archive directory to no result

I can dump the archivebox list to html, which is what I’m doing for now, but it’s not ideal since I can change the order of links (I prefer newest links at the top of the list, while the html dump puts oldest at top; it’s a pet peeve but the archivebox server allows me to change that).

Arch Linux, archivebox 0.5.3 via pip

Thanks!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:3
  • Comments:11 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
dohlincommented, Jan 15, 2021

I too am seeing the issue the moment my Chrome .html bookmarks file gets written to the database (starting a fresh install). For me, it seems to be only the /public URI that throws the error 500; the admin section seems to work fine (from what I’ve tested). There’s clearly something going on here.

EDIT: Even spun up a brand new Ubuntu 20.04 server to test setup from scratch, same issue. Tried an old html bookmarks backup file I had laying around from several months back and same issue. Everything works until I start the initial archive.

1reaction
berezovskyicommented, Jun 6, 2021

I just got a similar error with a URL https://link.foreignaffairs.com/click/60b4025ed373750fd780a8d9/aHR0cHM6Ly93d3cuZm9yZWlnbmFmZmFpcnMuY29tL2ZhX3VzZXIvc2ltcGxlX3JlZy9hdXRvbG9naW4_dG9rZW49YlVKelQzc0lDZ3FjWjB1bVhLQlVWdklxZHN1ajRFaEtWM3FYbE5TUnNtMXFFV3haM3loRmFTN05mWlc1RVRRSUlzOUxhQ1dvJTJCM1RQTnJaUE0lMkJaTlNSSTRNZUFycUZXVTNnNkxRdVJIN21zJTNEJmRlc3RpbmF0aW9uPS9ub2RlLzExMjc0NjcmdXRtX21lZGl1bT1wcm9tb19lbWFpbCZ1dG1fc291cmNlPWxvX2Zsb3dzJnV0bV9jYW1wYWlnbj1yZWdpc3RlcmVkX3VzZXJfd2VsY29tZSZ1dG1fdGVybT1lbWFpbF8xJnV0bV9jb250ZW50PTIwMjEwNTMw/60b4025ee8467f2b795d0d96B1f48eb65:

  File "/app/archivebox/index/schema.py", line 427, in canonical_outputs
    'wget_path': wget_output_path(self),
  File "/app/archivebox/util.py", line 114, in typechecked_function
    return func(*args, **kwargs)
  File "/app/archivebox/extractors/wget.py", line 170, in wget_output_path
    if search_dir.exists():
  File "/usr/local/lib/python3.9/pathlib.py", line 1414, in exists
    self.stat()
  File "/usr/local/lib/python3.9/pathlib.py", line 1222, in stat
    return self._accessor.stat(self)
OSError: [Errno 36] File name too long: '/data/archive/1622409932.315706/link.foreignaffairs.com/click/60b4025ed373750fd780a8d9/aHR0cHM6Ly93d3cuZm9yZWlnbmFmZmFpcnMuY29tL2ZhX3VzZXIvc2ltcGxlX3JlZy9hdXRvbG9naW4_dG9rZW49YlVKelQzc0lDZ3FjWjB1bVhLQlVWdklxZHN1ajRFaEtWM3FYbE5TUnNtMXFFV3haM3loRmFTN05mWlc1RVRRSUlzOUxhQ1dvJTJCM1RQTnJaUE0lMkJaTlNSSTRNZUFycUZXVTNnNkxRdVJIN21zJTNEJmRlc3RpbmF0aW9uPS9ub2RlLzExMjc0NjcmdXRtX21lZGl1bT1wcm9tb19lbWFpbCZ1dG1fc291cmNlPWxvX2Zsb3dzJnV0bV9jYW1wYWlnbj1yZWdpc3RlcmVkX3VzZXJfd2VsY29tZSZ1dG1fdGVybT1lbWFpbF8xJnV0bV9jb250ZW50PTIwMjEwNTMw/60b4025ee8467f2b795d0d96B1f48eb65'

Here is how to fix the system without losing the index:

  1. Find the offending URL bits in the logs (I did not realise the complete URL except the scheme was in a traceback).
  2. Copy the SQlite DB into a new file and into the folder with write permissions (for docker install, only dir up will be good, see why).
  3. Run sqlite3 %filename% and then the following query: select url, added from core_snapshot order by added desc limit 10;. You should see the full URL now.
  4. Run docker exec -it -u archivebox archivebox archivebox remove 'https://link.foreignaffairs.com/click/60b4025ed373750fd780a8d9/aHR0cHM6Ly93d3cuZm9yZWlnbmFmZmFpcnMuY29tL2ZhX3VzZXIvc2ltcGxlX3JlZy9hdXRvbG9naW4_dG9rZW49YlVKelQzc0lDZ3FjWjB1bVhLQlVWdklxZHN1ajRFaEtWM3FYbE5TUnNtMXFFV3haM3loRmFTN05mWlc1RVRRSUlzOUxhQ1dvJTJCM1RQTnJaUE0lMkJaTlNSSTRNZUFycUZXVTNnNkxRdVJIN21zJTNEJmRlc3RpbmF0aW9uPS9ub2RlLzExMjc0NjcmdXRtX21lZGl1bT1wcm9tb19lbWFpbCZ1dG1fc291cmNlPWxvX2Zsb3dzJnV0bV9jYW1wYWlnbj1yZWdpc3RlcmVkX3VzZXJfd2VsY29tZSZ1dG1fdGVybT1lbWFpbF8xJnV0bV9jb250ZW50PTIwMjEwNTMw/60b4025ee8467f2b795d0d96B1f48eb65' where my URL is replaced by the URL that is causing errors on your system.
Read more comments on GitHub >

github_iconTop Results From Across the Web

Server 500 error - Internet Archive Forums: Re
MESSAGE: Server error 500 -- probably because our servers are overloaded right now. Please retry either now or later (by hitting refresh/reload).
Read more >
500 error when trying to archive a project - How to Use GitLab
Our local GitLab instance works fine except for a few 500 errors. One of them happens consistently when trying to archive a project....
Read more >
no access to ecp or owa
I tried access ECP at https://localhost/ecp and I got the same problem. ... However I got HTTP 500 Internal Server Error at the...
Read more >
Not On The Internet - Hackaday
This is a problem because the same thing may be copied 20 times over on ... lathe I could find that can have...
Read more >
Wikipedia talk:Reference desk/Archive 50
Desk, not in error, but because they believe the entertainment desk will let them down, then we have a problem. The Ents. Desk...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found