question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add a way to delete an entry from the index and archive

See original GitHub issue

Occasionally I want to remove a URL from my archive. Currently this is a manual process of finding the entry in index.json, pulling out the timestamp, deleting the relevant lines, doing the same for index.html, and finally rm -r output/archive/$timestamp.

It would be nice if there was some slightly more automated way of doing this. Ideally I think this would be done with a final step after archiving, where the script would try to match each directory name in output/ with a timestamp in index.json. If a match isn’t found, the user is prompted with something like:

1536723384 not found in bookmark index. Delete output directory? (y/n)

This may be a behavior that is only enabled by an optional config option.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:6
  • Comments:10 (10 by maintainers)

github_iconTop GitHub Comments

2reactions
f0086commented, Oct 18, 2018

I’ve recently imported my complete Pinboard archive, there where a lot of bookmarks with dead links in it:

[√] [2018-10-17 23:28:37] Update of 4249 links complete (133.12 min)
    - 15219 entries skipped
    - 714 entries updated
    - 1063 errors

(The script crashed a few times with “Too many open files” errors, so I had to rerun it a couple of times)

My idea is to run this script once a day with a fresh dump from my pinboard export (I’ve wrote a little go program which dumps the whole list from pinboard). But with that 1063 links with errors, it will take hours (even with small timeouts) and is totally useless to retry that links.

Because that 1063 dead links will always be in that exported list, the archiver will always retry to download it. It would be nice if there where a flag or environment variable to skip that links which where previously failed to download. A “cleanup” flag would be even better, but skipping that links would be sufficient for my usecase.

1reaction
piratecommented, Jul 24, 2020

The new django version has both the ability to remove snapshots from the archive, and a separate archivebox update command independent from archivebox add so that you can control when to retry previously failed links.

git checkout django
git pull
# or pip install -e . to run it without docker
docker build . -t archivebox
docker run -v $PWD/output:/data archivebox init
docker run -v $PWD/output:/data archivebox add 'https://example.com'
docker run -v $PWD/output:/data archivebox remove --help
docker run -v $PWD/output:/data archivebox remove --delete 'https://example.com'
docker run -v $PWD/output:/data archivebox update

Adding a MAX_URL_ATTEMPTS option will be tracked in this separate issue: https://github.com/pirate/ArchiveBox/issues/109

Read more comments on GitHub >

github_iconTop Results From Across the Web

Deleting Index Entries - Microsoft Word Tips
Select the entire field, including the field braces, and press Del. The index entry is deleted. Repeat steps 2 and 3 for each...
Read more >
Delete an Index - SQL Server | Microsoft Learn
In the Indexes/Keys dialog box, select the index you want to delete. Click Delete. Click Close. On the File menu, select Savetable_name. Using ......
Read more >
Remove indexes and indexed data - Splunk Documentation
How to delete · 1. Disable or remove that source so that it no longer gets indexed. · 2. Search for events from...
Read more >
How to delete or add files to an existing item
How to delete or add files to an existing item · 1. Right click on the file you would like to delete. ·...
Read more >
Don't Delete! Marking Records Inactive. When and ... - YouTube
That video will show you how to Archive data to a different table ... append and delete query, and then create a Union...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found