Allow connectors to participate in query optimization
See original GitHub issueThis is an umbrella issue to track various projects related to allowing connectors to participate in query optimization. The long-term vision is for plugins to provide custom rules that can transform subplans into plugin-specific operations. This requires a set of steps:
- decouple the AST from the current IR (PlanNode tree): https://github.com/trinodb/trino/issues/13184
- get rid of visitor-based optimizers
- revamp the IR and optimizer to support a fully exploratory optimizer
- allow connectors to provide optimizer rules
In the short term, we can introduce special-purpose mechanisms to the SPI and engine enable the following behaviors:
- push down complex filters
- push down projections (e.g, row/array/map dereference, pre-calculated virtual columns)
- push down aggregations
- push down joins
- expose additional filters (e.g. for row-level authorization)
- expose more complex data organizations (e.g., custom partitioning schemes)
Document describing the high-level approach: https://github.com/prestosql/presto/wiki/Pushdown-of-complex-operations
Plan:
- https://github.com/trinodb/trino/issues/13184
- Simplify and hide notion of TableLayout: https://github.com/prestosql/presto/pull/363
- Deprecate table layouts: https://github.com/prestosql/presto/pull/420
- Support Constraint-based pushdown to transition from Table Layout: https://github.com/prestosql/presto/pull/541
- Define simplified expression language (some work in progress: https://github.com/prestosql/presto/pull/402)
- Provide TableHandle to PageSource/RecordCursor providers: https://github.com/prestosql/presto/issues/442
- Add pushXXXIntoConnector + Rule pairs
- Limit: https://github.com/prestosql/presto/pull/421
- Filter: https://github.com/prestosql/presto/pull/402, #7994
- Use canonical function name in expression pushdown (e.g. pass
substring
instead ofsubstr
)
- Use canonical function name in expression pushdown (e.g. pass
- Projection: https://github.com/prestosql/presto/pull/676
- Aggregation (https://github.com/trinodb/trino/issues/6613)
- Join (#6620)
- Sample: https://github.com/prestosql/presto/pull/753
- Top N
- engine & SPI (#4249)
- JDBC connectors https://github.com/prestosql/presto/issues/4769
- Push PARTIAL TopN into TableScan #7028
- Push TopN through Project into TableScan rule #7029
- Elasticsearch https://github.com/prestosql/presto/issues/4803
- Integrate into connectors
- Hive
- Parquet + ORC - Migrate https://github.com/prestosql/presto/pull/187 to new framework
- Optimize Top N queries over partition keys https://github.com/prestosql/presto/issues/3050
- JDBC connectors
- aggregation (https://github.com/trinodb/trino/issues/6613)
- join (#6620)
- …
- Hive
- Remove TableLayout-related APIs: https://github.com/prestosql/presto/issues/781
- SchemaTableName should not be required for synthetic ConnectorTableHandle returned from ConnectorMetadata.apply methods #6694
- Support connectors advanced pushdown in EXPLAIN IO #6695
Issue Analytics
- State:
- Created 5 years ago
- Reactions:53
- Comments:32 (17 by maintainers)
Top Results From Across the Web
Optimizer properties — Trino 403 Documentation
Compute hash codes for distribution, joins, and aggregations early during execution, allowing result to be shared between operations later in the query. This ......
Read more >Best practices when working with Power Query - Microsoft Learn
Choose the right connector Power Query offers a vast number of data connectors. These connectors range from data sources such as TXT, CSV,...
Read more >Query optimization - Prisma
This guide describes ways to optimize query performance, debug performance issues, and how to tackle common performance issues such as the ...
Read more >Pushdown — Starburst Enterprise
Join pushdown allows the connector to delegate the table join operation to the underlying data source. This can result in performance gains, and...
Read more >Optimize query computation | BigQuery - Google Cloud
Avoid repeatedly transforming data through SQL queries · Optimize your join patterns · Use INT64 data types in joins to reduce cost and...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’ve started collecting my thoughts on how I think we should approach this in this documents: https://github.com/prestosql/presto/wiki/Pushdown-of-complex-operations. It’s still incomplete and lacking many details, but the overall direction shouldn’t have to change much.
@josecanciani, absolutely! Once we add support for pushing down ORDER BY into connectors, that should work out of the box. The transformation you suggested happens entirely within the engine:
E.g., for:
you get the following plan: