Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

./manage.py buildwatson extremely slow on 0,5 million rows

See original GitHub issue

In my Postgresql db, there are around 438 972 rows that should be tracked by watson. The problem is that full index build (using the buildwatson management command) is extremely slow.

(cb)clime@vm6879 /srv/www/cb $ time ./manage.py buildwatson

Killed

real    123m22.753s

Here the process was killed probably because it reached some system limits. It had been running for more than two hours and didn’t finish.

These are register commands I use:

  watson.register(Crag, fields=('normalized_name', 'country'))
  watson.register(Member.objects.all(), fields=('normalized_name', 'user', 'country'))
  watson.register(Event, fields=('normalized_name', 'country'))
  watson.register(Route, fields=('normalized_name', 'crag__name', 'crag__normalized_name'))

The majority of all objects is contained in the Route model (more than 400 000).

I would be very happy if the time could be reduced somehow.

Issue Analytics

State:
Created 10 years ago
Comments:17 (8 by maintainers)

Top GitHub Comments

2reactions

joemarctcommented, Jul 18, 2016

@clime, would you mind to share the PL/pgSQL script you made? i have a similarly sized database that i need to build an index from.

1reaction

climecommented, Oct 4, 2013

Ye, they don’t scale well. On my server machine it has finally finished:

(cb)clime@vm6879 /srv/www/cb $ time ./manage.py buildwatson

refreshed 439000 search entry(s) in u'default' search engine.
Deleted 0 stale search entry(s) in u'default' search engine.
Refreshed 0 search entry(s) in u'admin' search engine.
Deleted 0 stale search entry(s) in u'admin' search engine.

real    1094m11.385s
user    43m48.102s
sys     0m32.725s

Over 18 hours xD and the server wasn’t under heavy load or something. On my local machine it is much faster (around 40 mins on the same data) so probably disk IO makes the difference (cpu was on 100% all the time but I don’t believe that only cpu would make such a difference, network is out of the question, db runs on the same machine as the application). I am not sure why I am posting it here. Probably there is just nothing that can be done but still, 18 hours is a lot right?

EDIT: I am additionally testing if there is a difference between first build and the following rebuilds.