Add a way to delete an entry from the index and archive
See original GitHub issueOccasionally I want to remove a URL from my archive. Currently this is a manual process of finding the entry in index.json
, pulling out the timestamp, deleting the relevant lines, doing the same for index.html
, and finally rm -r output/archive/$timestamp
.
It would be nice if there was some slightly more automated way of doing this. Ideally I think this would be done with a final step after archiving, where the script would try to match each directory name in output/
with a timestamp in index.json
. If a match isn’t found, the user is prompted with something like:
1536723384 not found in bookmark index. Delete output directory? (y/n)
This may be a behavior that is only enabled by an optional config option.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:6
- Comments:10 (10 by maintainers)
Top Results From Across the Web
Deleting Index Entries - Microsoft Word Tips
Select the entire field, including the field braces, and press Del. The index entry is deleted. Repeat steps 2 and 3 for each...
Read more >Delete an Index - SQL Server | Microsoft Learn
In the Indexes/Keys dialog box, select the index you want to delete. Click Delete. Click Close. On the File menu, select Savetable_name. Using ......
Read more >Remove indexes and indexed data - Splunk Documentation
How to delete · 1. Disable or remove that source so that it no longer gets indexed. · 2. Search for events from...
Read more >How to delete or add files to an existing item
How to delete or add files to an existing item · 1. Right click on the file you would like to delete. ·...
Read more >Don't Delete! Marking Records Inactive. When and ... - YouTube
That video will show you how to Archive data to a different table ... append and delete query, and then create a Union...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’ve recently imported my complete Pinboard archive, there where a lot of bookmarks with dead links in it:
(The script crashed a few times with “Too many open files” errors, so I had to rerun it a couple of times)
My idea is to run this script once a day with a fresh dump from my pinboard export (I’ve wrote a little go program which dumps the whole list from pinboard). But with that 1063 links with errors, it will take hours (even with small timeouts) and is totally useless to retry that links.
Because that 1063 dead links will always be in that exported list, the archiver will always retry to download it. It would be nice if there where a flag or environment variable to skip that links which where previously failed to download. A “cleanup” flag would be even better, but skipping that links would be sufficient for my usecase.
The new
django
version has both the ability to remove snapshots from the archive, and a separatearchivebox update
command independent fromarchivebox add
so that you can control when to retry previously failed links.Adding a
MAX_URL_ATTEMPTS
option will be tracked in this separate issue: https://github.com/pirate/ArchiveBox/issues/109