Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Expression optimizer should be able to disable constant folding when the returned value is huge

See original GitHub issue

We recently encountered a coordinator reliability issue. The query contain an expression like the following:

          CASE
              WHEN x <= 10000 THEN SEQUENCE(1, COALESCE(x, 1))
              WHEN x <= 20000 THEN FLATTEN(ARRAY[
                  SEQUENCE(1, 10000),
                  SEQUENCE(10001, x)
              ])
              WHEN x <= 30000 THEN FLATTEN(ARRAY[
                  SEQUENCE(1, 10000),
                  SEQUENCE(10001, 20000),
                  SEQUENCE(20001, x)
              ])
              WHEN x <= 40000 THEN FLATTEN(ARRAY[
                  SEQUENCE(1, 10000),
                  SEQUENCE(10001, 20000),
                  SEQUENCE(20001, 30000),
                  SEQUENCE(30001, x)
              ])
              WHEN x <= 50000 THEN FLATTEN(ARRAY[
                  SEQUENCE(1, 10000),
                  SEQUENCE(10001, 20000),
                  SEQUENCE(20001, 30000),
                  SEQUENCE(30001, 40000),
                  SEQUENCE(40001, x)
              ])
... more WHEN

Presto will optimize constant expression such as SEQUENCE(1, 10000) into a “magic literal”, which is basically the serialized bytes of evaluated result.

However, if the evaluated constant value is huge (e.g. in this case, there are many arrays of size 10K), it will make the plan huge.

https://github.com/prestodb/presto/issues/8964 further amplifies the issue and we should definitely fix that. However, I also think there should be some kind of guard when Presto stops to do constant folding to result in a reasonable sized plan.

cc @highker, @rongrong , @hellium01

Issue Analytics

State:
Created 4 years ago
Comments:9 (8 by maintainers)

Top GitHub Comments

1reaction

kaikalurcommented, Sep 17, 2019

Yeah we could cap array literal size explicitly to be small like 100 or so and suggest the user use SEQUENCE if it’s bigger. I would think the optimization opportunities for such large literals are rare enough that it may be ok to not do it all the time.

1reaction

hellium01commented, Sep 17, 2019

There might be cases constant folding might be useful later. For example, if we call apply/reduce function on top of an expanded sequence, the actual result will be small. There are some optimizers can utilize the fact that a field can be constant. Though, it looks like very rare SEQUENCE will be used in such cases, so simply disable it should be a safe bet.

We currently convert array constant into RowExpression other than “magic literal”, so it is serialized as a block, which will be much smaller than before. Another way is to change the serialization form of array to be more concise (which will benefit data exchange as well).