Windowing functions cause errors in lineage harvesting
See original GitHub issueSpline Team,
Testing some windowing functions in CTEs and temporary tables and we get the errors while processing. I’ve attached my sample notebook. The first two insert commands work fine. The final two fail.
- spark-3.1-spline-agent-bundle_2.12-0.6.1
- Databricks Runtime 8.3
spline-test-windowing-function.ipynb.txt
21/06/24 03:10:58 ERROR SplineQueryExecutionListener: Unexpected error occurred during lineage processing for application: Databricks Shell #local-1624504140464
java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.WindowExpression cannot be cast to org.apache.spark.sql.catalyst.expressions.NamedExpression
at scala.collection.LinearSeqOptimized.find(LinearSeqOptimized.scala:115)
at scala.collection.LinearSeqOptimized.find$(LinearSeqOptimized.scala:112)
at scala.collection.immutable.List.find(List.scala:89)
at za.co.absa.spline.harvester.builder.WindowNodeBuilder.resolveAttributeChild(WindowNodeBuilder.scala:31)
at za.co.absa.spline.harvester.builder.OperationNodeBuilder$$anon$1$$anonfun$$lessinit$greater$1.apply(OperationNodeBuilder.scala:46)
at za.co.absa.spline.harvester.builder.OperationNodeBuilder$$anon$1$$anonfun$$lessinit$greater$1.apply(OperationNodeBuilder.scala:46)
at za.co.absa.spline.harvester.converter.AttributeConverter.convert(AttributeConverter.scala:41)
at za.co.absa.spline.harvester.builder.OperationNodeBuilder$$anon$1.za$co$absa$commons$lang$CachingConverter$$super$convert(OperationNodeBuilder.scala:44)
at za.co.absa.commons.lang.CachingConverter.$anonfun$convert$1(converters.scala:47)
at scala.collection.mutable.MapLike.getOrElseUpdate(MapLike.scala:209)
at scala.collection.mutable.MapLike.getOrElseUpdate$(MapLike.scala:206)
at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:82)
at za.co.absa.commons.lang.CachingConverter.convert(converters.scala:47)
at za.co.absa.commons.lang.CachingConverter.convert$(converters.scala:44)
at za.co.absa.spline.harvester.builder.OperationNodeBuilder$$anon$1.convert(OperationNodeBuilder.scala:44)
at za.co.absa.spline.harvester.builder.OperationNodeBuilder.$anonfun$outputAttributes$1(OperationNodeBuilder.scala:71)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at scala.collection.immutable.List.map(List.scala:298)
at za.co.absa.spline.harvester.builder.OperationNodeBuilder.outputAttributes(OperationNodeBuilder.scala:71)
at za.co.absa.spline.harvester.builder.OperationNodeBuilder.outputAttributes$(OperationNodeBuilder.scala:70)
at za.co.absa.spline.harvester.builder.GenericNodeBuilder.outputAttributes$lzycompute(GenericNodeBuilder.scala:26)
at za.co.absa.spline.harvester.builder.GenericNodeBuilder.outputAttributes(GenericNodeBuilder.scala:26)
at za.co.absa.spline.harvester.builder.GenericNodeBuilder.build(GenericNodeBuilder.scala:41)
at za.co.absa.spline.harvester.builder.GenericNodeBuilder.build(GenericNodeBuilder.scala:26)
at za.co.absa.spline.harvester.LineageHarvester.$anonfun$harvest$10(LineageHarvester.scala:88)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at scala.collection.immutable.List.map(List.scala:298)
at za.co.absa.spline.harvester.LineageHarvester.$anonfun$harvest$8(LineageHarvester.scala:88)
at scala.Option.flatMap(Option.scala:271)
at za.co.absa.spline.harvester.LineageHarvester.harvest(LineageHarvester.scala:81)
at za.co.absa.spline.harvester.QueryExecutionEventHandler.onSuccess(QueryExecutionEventHandler.scala:42)
at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.$anonfun$onSuccess$2(SplineQueryExecutionListener.scala:40)
at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.$anonfun$onSuccess$2$adapted(SplineQueryExecutionListener.scala:40)
at scala.Option.foreach(Option.scala:407)
at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.$anonfun$onSuccess$1(SplineQueryExecutionListener.scala:40)
at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.withErrorHandling(SplineQueryExecutionListener.scala:49)
at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.onSuccess(SplineQueryExecutionListener.scala:40)
at org.apache.spark.sql.util.ExecutionListenerBus.doPostEvent(QueryExecutionListener.scala:155)
at org.apache.spark.sql.util.ExecutionListenerBus.doPostEvent(QueryExecutionListener.scala:131)
at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
at org.apache.spark.sql.util.ExecutionListenerBus.postToAll(QueryExecutionListener.scala:131)
at org.apache.spark.sql.util.ExecutionListenerBus.onOtherEvent(QueryExecutionListener.scala:135)
at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:84)
at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1523)
at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)
Issue Analytics
- State:
- Created 2 years ago
- Comments:27 (16 by maintainers)
Top Results From Across the Web
What Is Data Lineage? Why It's Important to Track Data Flow
It helps data scientists gain granular visibility of data dynamics and enables them to trace errors back to the root cause. Data lineage...
Read more >The Ultimate Guide To Data Lineage - Monte Carlo Data
Data lineage is a must-have feature of the modern data stack, yet we're struggling to derive value from it. Here's why and how...
Read more >Stitching Metadata for Tracing Data Lineage & Impact - LinkedIn
Importing and harvesting metadata is the very first step towards your journey to data governance in OEMM. You are using OEMM for several...
Read more >The Light-Harvesting Chlorophyll a/b Binding Proteins Lhcb1 ...
Here, we show that despite their nearly identical amino acid composition, the functional roles of Lhcb1 and Lhcb2 are different but complementary. Arabidopsis ......
Read more >Collibra-DIC-Data-Lineage-2022.09.pdf - My Document
Technical lineage is a detailed lineage graph that shows where data objects are used and how they are transformed.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@wajda Databricks team replied through Azure support that they have opened an internal ticket for the engineering team to make the Spline library work with DBR. I hope this is good news for the Spline Dev team, and I will keep you posted with updates from Databricks.
Thank you all guys, I’m closing the ticket as resioved then.
Not sure I get it. It also shows a target destination, both a short name and full path. Anyway, that’s an unrelated topic. Please create another ticket for that. (for UI stuff there is a spline-ui repo)