question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Predicate pushdown may result in query failure

See original GitHub issue

Projection IF (o.orderstatus = 'P', t.x[3], NULL) as y combined with o.orderstatus = 'P' AND y IS NOT NULL filter on top of a join causes pushdown of x[3] IS NOT NULL predicate into t. This may trigger Array subscript out of bounds error if t has records where x has fewer than 3 elements. In case these records don’t survive the join, the overall query should pass, but it fails.

To reproduce, add the following to com.facebook.presto.hive.TestHiveIntegrationSmokeTest (or execute queries manually):

    @Test
    public void test()
    {
        assertQuerySucceeds(getQueryRunner().getDefaultSession(),
                "CREATE TABLE test AS " +
                "SELECT\n" +
                "    l.linenumber,\n" +
                "    o.orderkey,\n" +
                "    o.orderstatus,\n" +
                "    IF (o.orderstatus = 'P', ARRAY[1, 2, 4], ARRAY[1]) AS x\n" +
                "FROM lineitem l, orders o\n" +
                "WHERE l.orderkey = o.orderkey");

        assertQuerySucceeds(getQueryRunner().getDefaultSession(),
                "SELECT *\n" +
                "FROM (\n" +
                "    SELECT\n" +
                "        o.orderstatus,\n" +
                "        IF (o.orderstatus = 'P', x[3], NULL) AS y\n" +
                "    FROM test t, orders o\n" +
                "    WHERE t.orderkey = o.orderkey\n" +
                ")\n" +
                "WHERE orderstatus = 'P' AND y IS NOT NULL");
        
        assertUpdate("DROP TABLE test");
    }

The stacktrace:

com.facebook.presto.spi.PrestoException: Array subscript out of bounds
	at com.facebook.presto.operator.scalar.ArraySubscriptOperator.checkIndex(ArraySubscriptOperator.java:166)
	at com.facebook.presto.operator.scalar.ArraySubscriptOperator.longSubscript(ArraySubscriptOperator.java:95)
	at com.facebook.presto.$gen.PageFilter_20190625_013659_979.filter(Unknown Source)
	at com.facebook.presto.$gen.PageFilter_20190625_013659_979.filter(Unknown Source)
	at com.facebook.presto.operator.project.DictionaryAwarePageFilter.filter(DictionaryAwarePageFilter.java:83)
	at com.facebook.presto.operator.project.PageProcessor.createWorkProcessor(PageProcessor.java:115)
	at com.facebook.presto.operator.project.PageProcessor.process(PageProcessor.java:101)
	at com.facebook.presto.operator.ScanFilterAndProjectOperator.processPageSource(ScanFilterAndProjectOperator.java:287)
	at com.facebook.presto.operator.ScanFilterAndProjectOperator.getOutput(ScanFilterAndProjectOperator.java:231)

com.facebook.presto.sql.planner.optimizations.PredicatePushDown.Rewriter#processInnerJoin already makes sure not to push down non-deterministic predicates. It may need to block pushdown of predicates that may generate an error or wrap these in try.

CC: @rongrong @highker @wenleix @arhimondr

P.S. The following fix introduced in 0.221 seems to expose this issue (e.g. queries that used to pass are now failing). It affects queries that use varchar columns. E.g. it would happen if type of orderstatus column in the above repro was changed to varchar.

aa6e60648d9d7d8ddedec70e500e89575d177707 is the first bad commit
commit aa6e60648d9d7d8ddedec70e500e89575d177707
Author: Andrii Rosa <andriirosa@fb.com>
Date:   Wed Apr 24 14:21:27 2019 -0400

    Fix equality inference for VARCHAR predicates

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:15 (14 by maintainers)

github_iconTop GitHub Comments

3reactions
mbasmanovacommented, Jun 25, 2019

@rongrong @highker I don’t think this issue qualifies as a release blocker because it seems to exist for a long time. I also don’t think we should revert #12724. I suggest we remove release-blocker label and work on a fix on a timeline that makes sense.

2reactions
kaikalurcommented, Aug 13, 2019

Simpler self-contained test case:

WITH T1 AS (
    SELECT
        *
    FROM (
        VALUES
            ('P', 1),
            ('N', 2)
    ) T(x, z)
),
test AS (
    SELECT
        *,
        IF (o.x = 'P', ARRAY[1, 2, 4], ARRAY[1]) AS a
    FROM T1 o
)
SELECT
    *
FROM (
    SELECT
        o.x,
        IF (o.x = 'P', t.a[3], NULL) AS y
    FROM test t,
        T1 o
    WHERE
        t.z = o.z
)
WHERE
    x = 'P'
    AND y IS NOT NULL;
Read more comments on GitHub >

github_iconTop Results From Across the Web

Predicate pushdown, why it doesn't work every time?
The query returns correct results but if you analyze the execution plan, you'll see that only an "is not null" filter was pushed...
Read more >
Hive problem with predicate pushdown in subqueries... - 179403
Hello,. I am facing a problem with predicates pushdowns. It is not working while adding a window function in the selected columns.
Read more >
Why is predicate pushdown not working? - Stack Overflow
With Beeline I access the SQL Endpoint (at port 10000) of the HiveThriftServer and perform a query. E.g.. SELECT * FROM TESTTABLE WHERE...
Read more >
Predicate Pushdown and why should I care?
Whenever you submit a query to SQL Server, if it includes a JOIN and/or WHERE clause, that constitutes a row filtering pattern known...
Read more >
PERFORMANCE ISSUE CAN OCCUR DUE TO LACK OF ...
No join predicate push down if the same set of join predicates are pushed down to multiple tables and the pushdown predicates.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found