question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Allow connectors to participate in query optimization

See original GitHub issue

This is an umbrella issue to track various projects related to allowing connectors to participate in query optimization. The long-term vision is for plugins to provide custom rules that can transform subplans into plugin-specific operations. This requires a set of steps:

  • decouple the AST from the current IR (PlanNode tree): https://github.com/trinodb/trino/issues/13184
  • get rid of visitor-based optimizers
  • revamp the IR and optimizer to support a fully exploratory optimizer
  • allow connectors to provide optimizer rules

In the short term, we can introduce special-purpose mechanisms to the SPI and engine enable the following behaviors:

  • push down complex filters
  • push down projections (e.g, row/array/map dereference, pre-calculated virtual columns)
  • push down aggregations
  • push down joins
  • expose additional filters (e.g. for row-level authorization)
  • expose more complex data organizations (e.g., custom partitioning schemes)

Document describing the high-level approach: https://github.com/prestosql/presto/wiki/Pushdown-of-complex-operations

Plan:

Issue Analytics

  • State:open
  • Created 5 years ago
  • Reactions:53
  • Comments:32 (17 by maintainers)

github_iconTop GitHub Comments

10reactions
martintcommented, Feb 8, 2019

I’ve started collecting my thoughts on how I think we should approach this in this documents: https://github.com/prestosql/presto/wiki/Pushdown-of-complex-operations. It’s still incomplete and lacking many details, but the overall direction shouldn’t have to change much.

2reactions
martintcommented, Jun 26, 2020

@josecanciani, absolutely! Once we add support for pushing down ORDER BY into connectors, that should work out of the box. The transformation you suggested happens entirely within the engine:

E.g., for:

SELECT orderkey 
FROM (
    SELECT * FROM orders 
    UNION ALL 
    SELECT * FROM orders) 
ORDER BY orderkey 
LIMIT 10;

you get the following plan:

 Output[orderkey]
 │   Layout: [expr_45:bigint]
 │   orderkey := expr_45
 └─ TopN[10 by (expr_45 ASC_NULLS_LAST)]
    │   Layout: [expr_45:bigint]
    └─ LocalExchange[SINGLE] ()
       │   Layout: [expr_45:bigint]
       └─ RemoteExchange[GATHER]
          │   Layout: [expr_45:bigint]
          ├─ TopNPartial[10 by (orderkey ASC_NULLS_LAST)]       <<<<<<<<<< 
          │  │   Layout: [orderkey:bigint]
          │  └─ TableScan[tpch:orders:sf0.01]
          │         Layout: [orderkey:bigint]
          │         orderkey := tpch:orderkey
          │         tpch:orderstatus
          │             :: [[F], [O], [P]]
          └─ TopNPartial[10 by (orderkey_17 ASC_NULLS_LAST)]    <<<<<<<<<<
             │   Layout: [orderkey_17:bigint]
             └─ TableScan[tpch:orders:sf0.01]
                    Layout: [orderkey_17:bigint]
                    orderkey_17 := tpch:orderkey
                    tpch:orderstatus
                        :: [[F], [O], [P]]
Read more comments on GitHub >

github_iconTop Results From Across the Web

Optimizer properties — Trino 403 Documentation
Compute hash codes for distribution, joins, and aggregations early during execution, allowing result to be shared between operations later in the query. This ......
Read more >
Best practices when working with Power Query - Microsoft Learn
Choose the right connector​​ Power Query offers a vast number of data connectors. These connectors range from data sources such as TXT, CSV,...
Read more >
Query optimization - Prisma
This guide describes ways to optimize query performance, debug performance issues, and how to tackle common performance issues such as the ...
Read more >
Pushdown — Starburst Enterprise
Join pushdown allows the connector to delegate the table join operation to the underlying data source. This can result in performance gains, and...
Read more >
Optimize query computation | BigQuery - Google Cloud
Avoid repeatedly transforming data through SQL queries · Optimize your join patterns · Use INT64 data types in joins to reduce cost and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found