question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Huge time for aggregation in postgresql

See original GitHub issue

Hello,

I’m pretty new to prestoDB usage but was not able to find any answer in doc or others issues.

When I do some pretty basic request through presto like select field1,max(field2) from table group by 1 with table which has several millions lines.

i have some requests which will take a few seconds directly on postgresql but when using prestodb, it’s several minutes. Checking the presto live plan, I see that presto will request all the data from table to next do the max aggregation. Can’t we force presto to request the aggregated data directly from pg ?

thx! PS: using prestoDB 0.222, didn’t find any bug related fixed in more recent versions.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:3
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
highkercommented, Sep 24, 2019

@sachdevs will work on plan pushdown for JDBC connectors

1reaction
mbasmanovacommented, Sep 24, 2019

@thomasLeclaire Your observations are correct. At the moment, Presto is not able to push down complex operations such as aggregations or joins into the data source. Hence, it read all the data, then aggregates on its own. @highker added infrastructure to enable pushdown of any part of the plan, but we are still missing support from the connectors. Specifically, Postgres connector needs to be modified to add support for pushing down operations. Let us know if you are interested in working on that.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How We Made Data Aggregation Better and Faster on ...
In 2019, TimescaleDB introduced continuous aggregates to solve this very problem, making the ongoing aggregation of massive time-series data ...
Read more >
Speeding up GROUP BY in PostgreSQL - CYBERTEC
In SQL the GROUP BY clause groups records into summary rows and turns large amounts of data into a smaller set. GROUP BY...
Read more >
PostgreSQL slow with million row aggregation (how to debug)
Next I ported all the aggregations to PostgreSQL - it was much faster, but 500'000 row aggregation still took like 10 seconds.
Read more >
Very slow postgresql aggregation
In the given execution plan query takes only 1,3 seconds and likely because about 40% of the table data is not in the...
Read more >
PostgreSQL 13 - Improve huge table data aggregation
You need a composite BTREE index on performance(currency, date, field02) to help satisfy this particular query efficiently.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found