[FEATURE REQUEST]: Support GroupedMapUdf in Spark-3.0.0
See original GitHub issueSpark has changed the expected contents of an Arrow RecordBatch
from 2.4 and 3.0. Spark 3.0 expects the results of the GroupedMap to be a column of StructType instead of separating each column from the DataFrame.
Spark2.4
- https://github.com/apache/spark/blob/807e0a484d1de767d1f02bd8a622da6450bdf940/python/pyspark/serializers.py#L219-L261
- https://github.com/apache/spark/blob/807e0a484d1de767d1f02bd8a622da6450bdf940/sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala#L140-L148
Spark3.0
- https://github.com/apache/spark/blob/3fdfce3120f307147244e5eaf46d61419a723d50/python/pyspark/sql/pandas/serializers.py#L135-L193
- https://github.com/apache/spark/blob/3fdfce3120f307147244e5eaf46d61419a723d50/sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala#L94
- https://github.com/apache/spark/blob/3fdfce3120f307147244e5eaf46d61419a723d50/sql/core/src/main/scala/org/apache/spark/sql/execution/python/PandasGroupUtils.scala#L38-L54
To support these changes we need to use Arrow’s StructArray
and StructType
. Unfortunately it is currently unsupported ARROW-6972
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
Spark Release 3.0.0
Here are the feature highlights in Spark 3.0: adaptive query execution; dynamic partition pruning; ANSI SQL compliance; significant improvements ...
Read more >Submit product feedback | Databricks on AWS
To interactively contribute to the product roadmap, submit a feature request in the Ideas Portal. You can view, comment, and vote up other ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Just adding a note that I’ll be working on this starting tomorrow or Monday 😃 I’ll update this post as I make progress!
I should be able to look into it later this week.