Re-running `archivebox init` loses metadata
See original GitHub issueThis is extracting a defect commented in #556
Describe the bug
Running archivebox init is a dangerous task (data is lost when the DB is recreated)
Steps to reproduce
Run archivebox init
Software versions
archivebox/archivebox:latest as of 11/29/2020
Discussion
archivebox init
should not be dangerous. Did some folders get wiped, or some entries in the database get lost when you ran it? Can you reliable reproduce it? It would be very helpful if that is the case.
_Originally posted by @cdvv7788 in https://github.com/ArchiveBox/ArchiveBox/issues/556#issuecomment-735478673_
I was mainly referring to the Timestamp and Title fields. I havnt used tags yet so I havnt tested that. When archivebox init
re-adds the snapshots to the DB, the timestamp gets overwritten and the title is re-generated. The net result is that I have a few hundred entries that claim to be added the same minute and a handful of titles that reverted back to “403 Forbidden” due to the WARC method failing. Finally, the Files indicators dont seem to be re-populating correctly in the admin panel
Files indicators:
Timestamps:
Issue Analytics
- State:
- Created 3 years ago
- Comments:13 (10 by maintainers)
Top GitHub Comments
Done https://github.com/ArchiveBox/ArchiveBox/commit/b186e98cd2eeb5cb375dedfaa21abcae1abec2be (we no longer push every commit to docker hub as
:latest
images, only the full releases)@pirate we discussed this approach before. It looks like the way to go…we should do that. If you want to have a build for everything, let’s setup another dockerhub repository.