question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Start bisecting after getting min/max from first database

See original GitHub issue

Some databases are awfully slow at getting min(id) and max(id) for a column when WHERE is added to the mix, for example, for this query in Snowflake, it takes 50s on a dataset in the millions of rows:

Running SQL (Snowflake): SELECT min(id), max(id) FROM TRANSFERS WHERE ('2022-06-04T14:27:44.096619' <= created_at) AND (created_at < '2022-06-14T13:57:44.096658')

In Postgres, the same query takes a few millis.

It’s going to be rare they’re different, so instead, we can just use the fast one and start bisecting.

When the second, slower database returns min + max, we compare with the faster one’s. If it’s not the same min/max, we’ll warn, and could restart the bisection. Alternatively, we can just start bisecting at the extremes if min/max are now extended, so it’s very graceful.

E.g. it should look like this:

Thread1, Time 00:00:00: Postgres: select min(id), max(id) from table
Thread2, Time 00:00:00: Snowflake: select min(id), max(id) from table
Thread1 Time 00:00:01: Postgres returns min=1, max=1000
... start bisecting ...
Thread2, Time 00:00:10: Snowflake returns min=1, max=1000
... continues bisecting because min + max are the same...

This should improve performance substantially on some platforms.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
sirupsencommented, Jul 27, 2022

Looks awesome @erezsh

0reactions
erezshcommented, Jul 27, 2022

@sirupsen Actually turned out to be pretty elegant!

Read more comments on GitHub >

github_iconTop Results From Across the Web

The Bisecting Min Max DBSCAN Algorithm - IOSR Journal
In this paper, a new approach of finding clusters similar to the clusters formed by. DBSCAN but with improved time complexity is introduced....
Read more >
MIN/MAX vs ORDER BY and LIMIT - Stack Overflow
We just reduced a DB with >10M rows from multi-second to sub-second by pivoting from order by with limit to group by with...
Read more >
Everything you need to know about Min-Max normalization
Everything you need to know about Min-Max normalization: A Python tutorial. In this post I explain what Min-Max scaling is, when to use...
Read more >
Divisive Hierarchical Bisecting Min–Max Clustering Algorithm
This paper purposes a K-means clustering algorithm based on improved filtering process. Thealgorithm improves the filtering process,The two ...
Read more >
SQL MIN and MAX Functions Explained in 6 Examples
First, let's talk about the MIN() function. It returns the smallest value in a set of values. The values can come from a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found