question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Feature Request: Ignore Errors Mode

See original GitHub issue

I am importing just under 1 million links supplied by forum users over 7 years. Not all links work and I need the system to skip over links it cannot import.

Type

  • General question or discussion
  • Propose a brand new feature
  • Request modification of existing behavior or design

What is the problem that your feature request solves

I have a list of about 250,000 links that match an %archive.%/% format. Not all links may be valid because users supply them. When I tried to import my links, it quickly bailed citing an “invalid IPv6 url”.

Describe the ideal specific solution you’d want, and whether it fits into any broader scope of changes

archivebox add < /tmp/links.txt --ignore-errors

What hacks or alternative solutions have you tried to solve the problem?

I can’t imagine any way of accomplishing what I need without putting each url to a different file and making a bash script to execute add on each one.

How badly do you want this new feature?

  • It’s an urgent deal-breaker, I can’t live without it
  • It’s important to add it in the near-mid term future
  • It would be nice to have eventually

  • I’m willing to contribute dev time / money to fix this issue
  • I like ArchiveBox so far / would recommend it to a friend
  • I’ve had a lot of difficulty getting ArchiveBox set up
sudo -u archive archivebox add < /tmp/archives.txt 
[i] [2020-08-15 16:32:11] ArchiveBox v0.4.13: archivebox add < /dev/stdin
    > /opt/archive

[+] [2020-08-15 16:32:12] Adding 228732 links to index (crawl depth=0)...
    > Saved verbatim input to sources/1597509132-import.txt
Traceback (most recent call last):                                                           
  File "/usr/local/bin/archivebox", line 10, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/dist-packages/archivebox/cli/__init__.py", line 126, in main
    pwd=pwd or OUTPUT_DIR,
  File "/usr/local/lib/python3.7/dist-packages/archivebox/cli/__init__.py", line 62, in run_subcommand
    module.main(args=subcommand_args, stdin=stdin, pwd=pwd)    # type: ignore
  File "/usr/local/lib/python3.7/dist-packages/archivebox/cli/archivebox_add.py", line 72, in main
    out_dir=pwd or OUTPUT_DIR,
  File "/usr/local/lib/python3.7/dist-packages/archivebox/util.py", line 111, in typechecked_function
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/archivebox/main.py", line 544, in add
    new_links += parse_links_from_source(write_ahead_log)
  File "/usr/local/lib/python3.7/dist-packages/archivebox/util.py", line 111, in typechecked_function
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/archivebox/index/__init__.py", line 284, in parse_links_from_source
    new_links = validate_links(raw_links)
  File "/usr/local/lib/python3.7/dist-packages/archivebox/util.py", line 111, in typechecked_function
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/archivebox/index/__init__.py", line 130, in validate_links
    links = sorted_links(links)      # deterministically sort the links based on timstamp, url
  File "/usr/local/lib/python3.7/dist-packages/archivebox/util.py", line 111, in typechecked_function
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/archivebox/index/__init__.py", line 175, in sorted_links
    return sorted(links, key=sort_func, reverse=True)
  File "/usr/local/lib/python3.7/dist-packages/archivebox/index/__init__.py", line 142, in archivable_links
    scheme_is_valid = scheme(link.url) in ('http', 'https', 'ftp')
  File "/usr/local/lib/python3.7/dist-packages/archivebox/util.py", line 30, in <lambda>
    scheme = lambda url: urlparse(url).scheme.lower()
  File "/usr/lib/python3.7/urllib/parse.py", line 368, in urlparse
    splitresult = urlsplit(url, scheme, allow_fragments)
  File "/usr/lib/python3.7/urllib/parse.py", line 435, in urlsplit
    raise ValueError("Invalid IPv6 URL")
ValueError: Invalid IPv6 URL

I don’t see any IPv6 links on my list, by the way. I can send that over as well. It looks like it may be a broken IPv6 on the page itself.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
piratecommented, Aug 18, 2020

Ah sorry, forgot to add the docs link for that. It should be fixed now on master

npm install -g 'git+https://github.com/gildas-lormeau/SingleFile.git'
npm install -g 'git+https://github.com/pirate/readability-extractor.git'
1reaction
cdvv7788commented, Aug 18, 2020

@jaw-sh https://github.com/pirate/ArchiveBox/blob/master/archivebox/config/__init__.py#L112 Readability now provides instructions on how to install itself. Please create a new issue if you are still experiencing errors related to that extractor after testing with the latest master version.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[Feature Request]: Ignore exception and continue · Issue #2670
It would be nice if when the debugger is paused on an exception, there is a button such as ignore and continue (this...
Read more >
How to Ignore certain Errors in Development Tools?
3 Answers 3 · Click the filter button, click errors to only show errors, then use the following regex: · ^((?!X|Y|Z).)*$.
Read more >
bypass/ignore/fix exception and continue debugging - YouTrack
Creation of feature request was welcomed, and I couldn't find one existing. Scenario: I'm debugging the following code: # First part of script...
Read more >
Interest in ignore errors feature - Google Groups
to ignore all numbers stored as text. The IgnoredError.sqref definition is wrong (sqref = CellRange, no call) and to fix that, you need...
Read more >
Error handling? - MikroTik - Forum
So we must send a feature request to Mikrotik... if we want to catch error in all commands. :) Best regards: CsXen.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found