question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

./manage.py buildwatson extremely slow on 0,5 million rows

See original GitHub issue

In my Postgresql db, there are around 438 972 rows that should be tracked by watson. The problem is that full index build (using the buildwatson management command) is extremely slow.

(cb)clime@vm6879 /srv/www/cb $ time ./manage.py buildwatson

Killed

real    123m22.753s

Here the process was killed probably because it reached some system limits. It had been running for more than two hours and didn’t finish.

These are register commands I use:

  watson.register(Crag, fields=('normalized_name', 'country'))
  watson.register(Member.objects.all(), fields=('normalized_name', 'user', 'country'))
  watson.register(Event, fields=('normalized_name', 'country'))
  watson.register(Route, fields=('normalized_name', 'crag__name', 'crag__normalized_name'))

The majority of all objects is contained in the Route model (more than 400 000).

I would be very happy if the time could be reduced somehow.

Issue Analytics

  • State:closed
  • Created 10 years ago
  • Comments:17 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
joemarctcommented, Jul 18, 2016

@clime, would you mind to share the PL/pgSQL script you made? i have a similarly sized database that i need to build an index from.

1reaction
climecommented, Oct 4, 2013

Ye, they don’t scale well. On my server machine it has finally finished:

(cb)clime@vm6879 /srv/www/cb $ time ./manage.py buildwatson

refreshed 439000 search entry(s) in u'default' search engine.
Deleted 0 stale search entry(s) in u'default' search engine.
Refreshed 0 search entry(s) in u'admin' search engine.
Deleted 0 stale search entry(s) in u'admin' search engine.

real    1094m11.385s
user    43m48.102s
sys     0m32.725s

Over 18 hours xD and the server wasn’t under heavy load or something. On my local machine it is much faster (around 40 mins on the same data) so probably disk IO makes the difference (cpu was on 100% all the time but I don’t believe that only cpu would make such a difference, network is out of the question, db runs on the same machine as the application). I am not sure why I am posting it here. Probably there is just nothing that can be done but still, 18 hours is a lot right?

EDIT: I am additionally testing if there is a difference between first build and the following rebuilds.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Python: slow processing of million records - Stack Overflow
I have to process about 9 hundred million rows as given in dataset1 . This is very slow. I tried to parallelize it...
Read more >
django select extremely slow on large table
Dear List, I'm trying to run a simple select on a very large Oracle table (1500 million records). ... class Marker(models.Model): marker_id =...
Read more >
Untitled
French horn conn 4d, Aaron allard-morgan 2015, 2005 subaru impreza rs wagon, ... Foxx a million new mixtape, Ekla akash cast, Parchment meaning...
Read more >
Programming | Macs in Chemistry
Python is a free, open source programming language with an emphasis on readability which is widely used in science due to its ease...
Read more >
Knowledge Creation via Data Analytics in a High Pressure Die ...
The high-pressure die-casting (HPDC) process is a commonly employed method of producing ... and (iii) cloud-scale data management and analytics (AI.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found