question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Feature] [Vectorized] Some Join opt in vec exec engine

See original GitHub issue

Search before asking

  • I had searched in the issues and found no similar issues.

Description

  1. shoule adjust the out join column to nullable select * from t1 left join t2 on t1.a = t2.a all slots of the t2 shoule be nullable change in join node.

  2. Adjust the join column of non nullaware join to non nullable select * from t1 join t2 on t1.a = t2.a if a is a nullable column shoule change to not nullable, because of join column = will skip the null.

  3. Useless columns do not need to be output, here need a projection to reduce the column output to speed up the join operator.

Use case

No response

Related issues

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
Gabriel39commented, Mar 24, 2022

还有个场景我觉得可以做类似优化,就是比如agg+having过滤,having中的列不管之后有没有用到都会带下去,事实上如果之后用不到的话,在agg这里就可以裁剪掉了

1reaction
EmmyMiao87commented, Mar 24, 2022

Join 性能优化

减少不必要的内存拷贝

Join Node 的输出 schema 与 Join Node 的输入 schema 不同。但当前 Doris 的 Join Node 算子在构造结果行时,直接将左右孩子的 tuple 进行拼接。 而实际上结果行的列可能是输入行中列的子集。这导致了很多无用的内存拷贝。

举例说明

select a.k1 from a, b where a.k1=b.k1;

输入 schema : a.k1, b.k1 输出 schema :a.k1, b.k1 优化后输出 schema :a.k1

MySQL [ssb]> select count(d_datekey) from lineorder inner join date on lo_orderdate = d_datekey;
+--------------------+
| count(`d_datekey`) |
+--------------------+
|          600037902 |
+--------------------+
1 row in set (10.555 sec)

打印 perf 发现,主要耗时函数在: image

  1. replicate 负责非 Join 列的结果填写函数。占用约 10%

测试

After PR: #8618

下面查询主要在 join 后可以裁剪列 lo_orderdate。效果如下:

MySQL [ssb]> select count(d_datekey) from lineorder inner join date on lo_orderdate = d_datekey;
+--------------------+
| count(`d_datekey`) |
+--------------------+
|          600037902 |
+--------------------+
1 row in set (7.286 sec)

MySQL [ssb]> set enable_hash_project=true;
Query OK, 0 rows affected (0.001 sec)

MySQL [ssb]> select count(d_datekey) from lineorder inner join date on lo_orderdate = d_datekey;
+--------------------+
| count(`d_datekey`) |
+--------------------+
|          600037902 |
+--------------------+
1 row in set (5.479 sec)

从测试结果看,开启裁剪后,性能提升 10% 符合预期。

Read more comments on GitHub >

github_iconTop Results From Across the Web

[Feature] [Vectorized] Some Join opt in vec exec engine #7901
I had searched in the issues and found no similar issues. Description. shoule adjust the out join column to nullable select * from...
Read more >
40x faster hash joiner with vectorized execution
The title of this blog post made a bold claim: vectorized execution improved the performance of CockroachDB's hash join operator by 40x.
Read more >
Vectorization vs. Compilation in Query Execution - CMU 15-721
an engine evaluates plans consisting of algebraic operators, such as Scan, Join, Project, Aggregation and Select. The op-.
Read more >
Use Automatic Vectorization - Intel
This option enables vectorization at default optimization levels for both Intel® microprocessors and non-Intel microprocessors.
Read more >
Everything You Always Wanted to Know About Compiled and ...
The query engines of most modern database systems are either ... 2.2 Vectorized Hash Join and Group By ... The join function then...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found