./manage.py buildwatson extremely slow on 0,5 million rows
See original GitHub issueIn my Postgresql db, there are around 438 972 rows that should be tracked by watson. The problem is that full index build (using the buildwatson management command) is extremely slow.
(cb)clime@vm6879 /srv/www/cb $ time ./manage.py buildwatson
Killed
real 123m22.753s
Here the process was killed probably because it reached some system limits. It had been running for more than two hours and didn’t finish.
These are register commands I use:
watson.register(Crag, fields=('normalized_name', 'country'))
watson.register(Member.objects.all(), fields=('normalized_name', 'user', 'country'))
watson.register(Event, fields=('normalized_name', 'country'))
watson.register(Route, fields=('normalized_name', 'crag__name', 'crag__normalized_name'))
The majority of all objects is contained in the Route model (more than 400 000).
I would be very happy if the time could be reduced somehow.
Issue Analytics
- State:
- Created 10 years ago
- Comments:17 (8 by maintainers)
Top Results From Across the Web
Python: slow processing of million records - Stack Overflow
I have to process about 9 hundred million rows as given in dataset1 . This is very slow. I tried to parallelize it...
Read more >django select extremely slow on large table
Dear List, I'm trying to run a simple select on a very large Oracle table (1500 million records). ... class Marker(models.Model): marker_id =...
Read more >Untitled
French horn conn 4d, Aaron allard-morgan 2015, 2005 subaru impreza rs wagon, ... Foxx a million new mixtape, Ekla akash cast, Parchment meaning...
Read more >Programming | Macs in Chemistry
Python is a free, open source programming language with an emphasis on readability which is widely used in science due to its ease...
Read more >Knowledge Creation via Data Analytics in a High Pressure Die ...
The high-pressure die-casting (HPDC) process is a commonly employed method of producing ... and (iii) cloud-scale data management and analytics (AI.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@clime, would you mind to share the PL/pgSQL script you made? i have a similarly sized database that i need to build an index from.
Ye, they don’t scale well. On my server machine it has finally finished:
Over 18 hours xD and the server wasn’t under heavy load or something. On my local machine it is much faster (around 40 mins on the same data) so probably disk IO makes the difference (cpu was on 100% all the time but I don’t believe that only cpu would make such a difference, network is out of the question, db runs on the same machine as the application). I am not sure why I am posting it here. Probably there is just nothing that can be done but still, 18 hours is a lot right?
EDIT: I am additionally testing if there is a difference between first build and the following rebuilds.