Spark: Iceberg Data Source does not support struct literal predicates
See original GitHub issueFrom the discussion in https://github.com/apache/iceberg/pull/5113 with @huaxingao , I found this behavior:
For Iceberg table:
select * from table where table.struct_field = struct(10)
org.apache.spark.sql.AnalysisException: cannot resolve ‘(table.struct_field = struct(10))’ due to data type mismatch: differing types in ‘(table.struct_field = struct(1))’ (structnested:int and structcol1:int).; line 1 pos 39;
select * from table where table.struct_field in (struct(10))
java.lang.IllegalArgumentException: Cannot create expression literal from org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema: [1]
at org.apache.iceberg.expressions.Literals.from(Literals.java:87)
at org.apache.iceberg.expressions.UnboundPredicate.<init>(UnboundPredicate.java:40)
at org.apache.iceberg.expressions.Expressions.equal(Expressions.java:175)
at org.apache.iceberg.spark.SparkFilters.handleEqual(SparkFilters.java:239)
at org.apache.iceberg.spark.SparkFilters.convert(SparkFilters.java:152)
at org.apache.iceberg.spark.source.SparkScanBuilder.pushFilters(SparkScanBuilder.java:106)
at org.apache.spark.sql.execution.datasources.v2.PushDownUtils$.pushFilters(PushDownUtils.scala:69)
at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$$anonfun$pushDownFilters$1.applyOrElse(V2ScanRelationPushDown.scala:60)
at org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown$$anonfun$pushDownFilters$1.applyOrElse(V2ScanRelationPushDown.scala:47)
For non-Iceberg table:
spark.sql("select * from test_struct_non_iceberg where struct_field in(struct(10))").show
+------------+
|struct_field|
+------------+
| {10}|
+------------+
scala> spark.sql("select * from test_struct_non_iceberg where struct_field = struct(10)").show
+------------+
|struct_field|
+------------+
| {10}|
+------------+
Iceberg cannot handle complex predicate filters (as it does not collect metrics for anything other than primitive columns). So maybe we should not even push down the filters in SparkScanBuilder. There may also be other problem (the returned schema not matching)
Issue Analytics
- State:
- Created a year ago
- Reactions:2
- Comments:10 (5 by maintainers)
Top GitHub Comments
You are right. We shouldn’t push down the complex predicate filters. I think we should catch the IllegalArgumentException here and then the complex predicate filter won’t be pushed. Similar as this behavior.
I will open a PR to fix this.
Closed by #5204 , we can work on the optimizations later.