Task polling using sequential scan instead of index scan
See original GitHub issueHi, we are using db-scheduler under quite heavy load (millions of due tasks, executing 2.5k tasks per second) and overall this library is handling it pretty well 😃 but sometimes throughput drops significantly (from 2.5k tsp to ~100 tsp). After our investigation we came to conclusion that the cause lies in Postgres query planner.
Since TaskRepository.getDue
executes following sql code
"select * from " + tableName + " where picked = ? and execution_time <= ? " + unresolvedFilter.andCondition() + " order by execution_time asc"
Postgres query planner does not know that we need only first N tasks matching this criteria it decides to use sequential scan and begins sorting whole table based on execution_time as it would be faster in case when we need to get all of the rows, but when only subset is needed it could use presorted index, sometimes it does use index scan (query planner works in mysterious ways 😄).
Query execution times in our case look something like that:
- when sequential scan is used getDue takes around 10 seconds
- when index scan is used getDue takes around 50 milliseconds
Based on https://www.postgresql.org/docs/current/indexes-ordering.html when this kind of query has explicit LIMIT statement added it would always use index scan, as then it would know that we need only small subset.
With JDBC.setMaxRows
limiting works in different way, in case of PostgreSQL driver it does not append LIMIT to query but after fetching desired row number it stops getting rest of data.
Since your library supports all SQL servers it might be impossible to implement generic way to always enforce this behaviour, afaik LIMIT won’t work eg. in Oracle DB.
What we could do for our case is to override JdbcTaskRepository.getDue()
and implement this method to work for Postgres.
I would love to hear another solutions or suggestions. 😃
Issue Analytics
- State:
- Created 3 years ago
- Comments:14 (14 by maintainers)
Top GitHub Comments
I have released your contribution in 9.3. Thanks! https://github.com/kagkarlsson/db-scheduler/releases/tag/9.3
Sure, I will make PR with Postgres customization