question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Some Indexers need an overwrite_db or last_indexed_time parameter.

See original GitHub issue

I have made a Joplin indexer. But there is a problem that the indexer needs a incremental updating parameter when the database is large. I have 8000+ notes in my Joplin database. Joplin indexer finds 24000+ URLs which can be Visits. It takes 17 minutes long on my laptop.

Joplin has a update_time field in notes table. So I think I can implement incremental indexing(updating) in the indexer.

However, there is no overwrite_db parameter in the Indexer when a user pass --overwrite parameter and wants to restart the indexing. Or if last_indexed_time in the promnesia framework would be passed by iter_all_visits, It would be much more helpful.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
hwiorncommented, Mar 15, 2022

My laptop is Dell Inspiron 7501(i7-10750H CPU @ 2.60GHz 16GB RAM). I don’t think this laptop is a slow environment. But some machine such as RPis and AWS light-sail(1 core) could be slow.

It’s actually surprising it takes 17 minutes, for 8K notes/24K URLs – do you know how many lines are these? Unless your laptop is really weak, I would expect it to index much faster.

Many notes were from Evernote. I used Joplin as an archiving tool and wrote a journal at work. Some notes are web-clipped notes, and It seems to have many useless links. Recently, I am switching the Joplin to org-roam and learning the Zettelkasten method and I use Joplin as way-back machine now.

Maybe you can log indexing times for individual notes, figure out the one that takes longest and then we can profile it?

The Joplin indexer was a proof-of-concept, and It is just an initial version. So I think I can profile the indexing.

It kinda makes sense, but one downside is that it’s possible that some URLs were removed from the note, and they would still be present in promnesia database, because the ‘interface’ of indexers in Promnesia is currently only supporting adding new visits. So it would trigger some phantom visits.

Right. Incremental and partial update needs two metadata at least.

  • Last sync time
  • Mapping ID between source and target.

We might think of changing the interface somehow, but I’d much rather speed up the indexer for simplicity.

Yeah, you are right. I can optimize the indexer better. But I think Promnesia needs incremental update code for slow machine and indexing efficiently.

0reactions
hwiorncommented, Mar 25, 2022

Related: #243

Read more comments on GitHub >

github_iconTop Results From Across the Web

lastIndexTime not workin solr index update - SAP Community
Hello, I am having problem with the solr index update when I use the parameter "?lastIndexTime ". When I use the following it...
Read more >
Last indexed time in API? #3423 - oracle/opengrok - GitHub
Last indexed time is displayed on the bottom of the page and indicates to an extent if the indexing process is working as...
Read more >
Monitor and troubleshoot - Alfresco Docs
Monitor and troubleshoot. This page helps you monitor and resolve any Solr index issues that might arise as a result of a transaction....
Read more >
Solr get the last-indexed-time programmatically - Stack Overflow
I am using apache solr-6.0.0. I have a collection : my-search. whenever I run ...
Read more >
Introduction to Solr Indexing | Apache Solr Reference Guide 6.6
A Solr index can accept data from many different sources, including XML files, comma-separated value (CSV) files, data extracted from tables in a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found