question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Re-running `archivebox init` loses metadata

See original GitHub issue

This is extracting a defect commented in #556

Describe the bug

Running archivebox init is a dangerous task (data is lost when the DB is recreated)

Steps to reproduce

Run archivebox init

Software versions

archivebox/archivebox:latest as of 11/29/2020

Discussion

archivebox init should not be dangerous. Did some folders get wiped, or some entries in the database get lost when you ran it? Can you reliable reproduce it? It would be very helpful if that is the case.

_Originally posted by @cdvv7788 in https://github.com/ArchiveBox/ArchiveBox/issues/556#issuecomment-735478673_

I was mainly referring to the Timestamp and Title fields. I havnt used tags yet so I havnt tested that. When archivebox init re-adds the snapshots to the DB, the timestamp gets overwritten and the title is re-generated. The net result is that I have a few hundred entries that claim to be added the same minute and a handful of titles that reverted back to “403 Forbidden” due to the WARC method failing. Finally, the Files indicators dont seem to be re-populating correctly in the admin panel

Files indicators: image

Timestamps: image

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:13 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
piratecommented, Dec 11, 2020

Done https://github.com/ArchiveBox/ArchiveBox/commit/b186e98cd2eeb5cb375dedfaa21abcae1abec2be (we no longer push every commit to docker hub as :latest images, only the full releases)

1reaction
cdvv7788commented, Dec 3, 2020

@pirate we discussed this approach before. It looks like the way to go…we should do that. If you want to have a build for everything, let’s setup another dockerhub repository.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Dockerfile changes · Issue #556 · ArchiveBox ...
Running archivebox init is a dangerous task (data is lost when the DB is recreated) but Ive needed to run it a couple...
Read more >
How does it work? — ArchiveBox 0.4.16 documentation
ArchiveBox takes a list of website URLs you want to archive, and creates a local, static, browsable HTML clone of the content from...
Read more >
Configuration — ArchiveBox 0.5.3 documentation
By default, ArchiveBox will only archive new links on each import. If you want it to go back through all links in the...
Read more >
ArchiveBox Open-source self-hosted web archiving.
ArchiveBox is a powerful, self-hosted internet archiving solution to collect, save, and view sites you want to preserve offline.
Read more >
Release 0.6.3
it by running archivebox init. ... install any missing dependencies manually, or use the helper script: ./bin/setup.sh.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found