Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pushdown doesn't work with nested fields

See original GitHub issue

Elasticsearch & ES-Hadoop 5.0.0-beta1, Spark 2.0.0

curl -XPOST localhost:9200/pushdown/pushdown -d '{"a":"b","c":{"d":"e"}}'

df = sqlContext.read.format("org.elasticsearch.spark.sql").load("pushdown/pushdown")
df.filter(df.a == "b").show()

as expected generates:

{"query":{"bool":{"must":[{"match_all":{}}],"filter":[{"exists":{"field":"a"}},{"match":{"a":"b"}}]}}}

df.filter(df.c.d == "e").show()

doesn’t generate any pushdown:

{"query":{"match_all":{}}}

Issue Analytics

State:
Created 7 years ago
Reactions:3
Comments:11 (5 by maintainers)

Top GitHub Comments

2reactions

ebuildycommented, Sep 5, 2018

I believe this is happen also for column/field pruning, and it looks a complicate issue, with the latest version (6.4.0), I see the root field is sent, but never the whole nested field.

If we look the code source: https://github.com/elastic/elasticsearch-hadoop/blob/master/spark/sql-20/src/main/scala/org/elasticsearch/spark/sql/DefaultSource.scala#L233 this plugin “forward” the column/field given by Spark to the search query builder, so the problem is not here.

Also, look like they (Spark team) have just done this for Parquet file format: https://github.com/apache/spark/pull/21320/files

0reactions

masseykecommented, Jan 4, 2022

If I’m reading this correctly, spark does not allow predicates for nested fields to be pushed down for non-hadoop backends in DSv1: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala#L110. If that’s the case, we need #1801 before we can do this.

Top Results From Across the Web

What's new in Apache Spark 3.0 - predicate pushdown ...

Let's see how the predicate pushdown for the nested fields works in Apache Spark 3.0. Below you can find the code and the...

[#SPARK-17636] Parquet predicate pushdown for nested fields

There's a PushedFilters for a simple numeric field, but not for a numeric field inside a struct. Not sure if this is a...

Querying Parquet file nested column scan whole column even ...

1. SPARK-17636 shows that nested field involved in the predicate, will not trigger push down. What I experience is that even when other...

Apache Spark 3 and predicate pushdown for nested fields

Pushdown predicate for nested fields Check the blog post "What's new in Apache Spark 3.0 - predicate pushdown support for nested fields "...

Faster Queries on Nested Data - Trino

The work for this improvement is being tracked in this issue. Similar to Hive Connector, connector-level dereference pushdown can be extended to ...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Pushdown doesn't work with nested fields

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Unsupported/Unknown Elasticsearch version 5.1.1

Support fields that are sometimes arrays