question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Task polling using sequential scan instead of index scan

See original GitHub issue

Hi, we are using db-scheduler under quite heavy load (millions of due tasks, executing 2.5k tasks per second) and overall this library is handling it pretty well 😃 but sometimes throughput drops significantly (from 2.5k tsp to ~100 tsp). After our investigation we came to conclusion that the cause lies in Postgres query planner. Since TaskRepository.getDue executes following sql code "select * from " + tableName + " where picked = ? and execution_time <= ? " + unresolvedFilter.andCondition() + " order by execution_time asc" Postgres query planner does not know that we need only first N tasks matching this criteria it decides to use sequential scan and begins sorting whole table based on execution_time as it would be faster in case when we need to get all of the rows, but when only subset is needed it could use presorted index, sometimes it does use index scan (query planner works in mysterious ways 😄). Query execution times in our case look something like that:

  • when sequential scan is used getDue takes around 10 seconds
  • when index scan is used getDue takes around 50 milliseconds

Based on https://www.postgresql.org/docs/current/indexes-ordering.html when this kind of query has explicit LIMIT statement added it would always use index scan, as then it would know that we need only small subset. With JDBC.setMaxRows limiting works in different way, in case of PostgreSQL driver it does not append LIMIT to query but after fetching desired row number it stops getting rest of data. Since your library supports all SQL servers it might be impossible to implement generic way to always enforce this behaviour, afaik LIMIT won’t work eg. in Oracle DB.

What we could do for our case is to override JdbcTaskRepository.getDue() and implement this method to work for Postgres. I would love to hear another solutions or suggestions. 😃

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:14 (14 by maintainers)

github_iconTop GitHub Comments

2reactions
kagkarlssoncommented, Jan 19, 2021

I have released your contribution in 9.3. Thanks! https://github.com/kagkarlsson/db-scheduler/releases/tag/9.3

0reactions
rafal-kowalskicommented, Jan 15, 2021

Sure, I will make PR with Postgres customization

Read more comments on GitHub >

github_iconTop Results From Across the Web

Postgres is performing sequential scan instead of index scan
CLUSTER can re-sort the table using either an index scan on the specified index, or (if the index is a b-tree) a sequential...
Read more >
SELECT that uses sequential scan instead of index scan
SELECT that uses sequential scan instead of index scan ; ASC NULLS LAST); CREATE ; CREATE INDEX "index_multiple" ON "public"."comptes" USING ; ASC...
Read more >
Understanding EXPLAIN plans - GitLab Docs
Bitmap scans fall between sequential scans and index scans. These are typically used when we would read too much data from an index...
Read more >
Queries in PostgreSQL: 3. Sequential scan
Sequential scan is the most cost-effective way of scanning a whole table or a significant portion of it.
Read more >
Query planner using sequential scan instead of index ... - GitHub
When performing a query using a disjunction on a column's value, the query planner uses sequential scans instead of index scans on compressed ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found