Start bisecting after getting min/max from first database
See original GitHub issueSome databases are awfully slow at getting min(id)
and max(id)
for a column when WHERE
is added to the mix, for example, for this query in Snowflake, it takes 50s on a dataset in the millions of rows:
Running SQL (Snowflake): SELECT min(id), max(id) FROM TRANSFERS WHERE ('2022-06-04T14:27:44.096619' <= created_at) AND (created_at < '2022-06-14T13:57:44.096658')
In Postgres, the same query takes a few millis.
It’s going to be rare they’re different, so instead, we can just use the fast one and start bisecting.
When the second, slower database returns min + max, we compare with the faster one’s. If it’s not the same min/max, we’ll warn, and could restart the bisection. Alternatively, we can just start bisecting at the extremes if min/max are now extended, so it’s very graceful.
E.g. it should look like this:
Thread1, Time 00:00:00: Postgres: select min(id), max(id) from table
Thread2, Time 00:00:00: Snowflake: select min(id), max(id) from table
Thread1 Time 00:00:01: Postgres returns min=1, max=1000
... start bisecting ...
Thread2, Time 00:00:10: Snowflake returns min=1, max=1000
... continues bisecting because min + max are the same...
This should improve performance substantially on some platforms.
Issue Analytics
- State:
- Created a year ago
- Comments:5
Top Results From Across the Web
The Bisecting Min Max DBSCAN Algorithm - IOSR Journal
In this paper, a new approach of finding clusters similar to the clusters formed by. DBSCAN but with improved time complexity is introduced....
Read more >MIN/MAX vs ORDER BY and LIMIT - Stack Overflow
We just reduced a DB with >10M rows from multi-second to sub-second by pivoting from order by with limit to group by with...
Read more >Everything you need to know about Min-Max normalization
Everything you need to know about Min-Max normalization: A Python tutorial. In this post I explain what Min-Max scaling is, when to use...
Read more >Divisive Hierarchical Bisecting Min–Max Clustering Algorithm
This paper purposes a K-means clustering algorithm based on improved filtering process. Thealgorithm improves the filtering process,The two ...
Read more >SQL MIN and MAX Functions Explained in 6 Examples
First, let's talk about the MIN() function. It returns the smallest value in a set of values. The values can come from a...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Looks awesome @erezsh
@sirupsen Actually turned out to be pretty elegant!