question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Windowing functions cause errors in lineage harvesting

See original GitHub issue

Spline Team,

Testing some windowing functions in CTEs and temporary tables and we get the errors while processing. I’ve attached my sample notebook. The first two insert commands work fine. The final two fail.

  • spark-3.1-spline-agent-bundle_2.12-0.6.1
  • Databricks Runtime 8.3

spline-test-windowing-function.ipynb.txt

21/06/24 03:10:58 ERROR SplineQueryExecutionListener: Unexpected error occurred during lineage processing for application: Databricks Shell #local-1624504140464
java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.WindowExpression cannot be cast to org.apache.spark.sql.catalyst.expressions.NamedExpression
	at scala.collection.LinearSeqOptimized.find(LinearSeqOptimized.scala:115)
	at scala.collection.LinearSeqOptimized.find$(LinearSeqOptimized.scala:112)
	at scala.collection.immutable.List.find(List.scala:89)
	at za.co.absa.spline.harvester.builder.WindowNodeBuilder.resolveAttributeChild(WindowNodeBuilder.scala:31)
	at za.co.absa.spline.harvester.builder.OperationNodeBuilder$$anon$1$$anonfun$$lessinit$greater$1.apply(OperationNodeBuilder.scala:46)
	at za.co.absa.spline.harvester.builder.OperationNodeBuilder$$anon$1$$anonfun$$lessinit$greater$1.apply(OperationNodeBuilder.scala:46)
	at za.co.absa.spline.harvester.converter.AttributeConverter.convert(AttributeConverter.scala:41)
	at za.co.absa.spline.harvester.builder.OperationNodeBuilder$$anon$1.za$co$absa$commons$lang$CachingConverter$$super$convert(OperationNodeBuilder.scala:44)
	at za.co.absa.commons.lang.CachingConverter.$anonfun$convert$1(converters.scala:47)
	at scala.collection.mutable.MapLike.getOrElseUpdate(MapLike.scala:209)
	at scala.collection.mutable.MapLike.getOrElseUpdate$(MapLike.scala:206)
	at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:82)
	at za.co.absa.commons.lang.CachingConverter.convert(converters.scala:47)
	at za.co.absa.commons.lang.CachingConverter.convert$(converters.scala:44)
	at za.co.absa.spline.harvester.builder.OperationNodeBuilder$$anon$1.convert(OperationNodeBuilder.scala:44)
	at za.co.absa.spline.harvester.builder.OperationNodeBuilder.$anonfun$outputAttributes$1(OperationNodeBuilder.scala:71)
	at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
	at scala.collection.immutable.List.foreach(List.scala:392)
	at scala.collection.TraversableLike.map(TraversableLike.scala:238)
	at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
	at scala.collection.immutable.List.map(List.scala:298)
	at za.co.absa.spline.harvester.builder.OperationNodeBuilder.outputAttributes(OperationNodeBuilder.scala:71)
	at za.co.absa.spline.harvester.builder.OperationNodeBuilder.outputAttributes$(OperationNodeBuilder.scala:70)
	at za.co.absa.spline.harvester.builder.GenericNodeBuilder.outputAttributes$lzycompute(GenericNodeBuilder.scala:26)
	at za.co.absa.spline.harvester.builder.GenericNodeBuilder.outputAttributes(GenericNodeBuilder.scala:26)
	at za.co.absa.spline.harvester.builder.GenericNodeBuilder.build(GenericNodeBuilder.scala:41)
	at za.co.absa.spline.harvester.builder.GenericNodeBuilder.build(GenericNodeBuilder.scala:26)
	at za.co.absa.spline.harvester.LineageHarvester.$anonfun$harvest$10(LineageHarvester.scala:88)
	at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
	at scala.collection.immutable.List.foreach(List.scala:392)
	at scala.collection.TraversableLike.map(TraversableLike.scala:238)
	at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
	at scala.collection.immutable.List.map(List.scala:298)
	at za.co.absa.spline.harvester.LineageHarvester.$anonfun$harvest$8(LineageHarvester.scala:88)
	at scala.Option.flatMap(Option.scala:271)
	at za.co.absa.spline.harvester.LineageHarvester.harvest(LineageHarvester.scala:81)
	at za.co.absa.spline.harvester.QueryExecutionEventHandler.onSuccess(QueryExecutionEventHandler.scala:42)
	at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.$anonfun$onSuccess$2(SplineQueryExecutionListener.scala:40)
	at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.$anonfun$onSuccess$2$adapted(SplineQueryExecutionListener.scala:40)
	at scala.Option.foreach(Option.scala:407)
	at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.$anonfun$onSuccess$1(SplineQueryExecutionListener.scala:40)
	at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.withErrorHandling(SplineQueryExecutionListener.scala:49)
	at za.co.absa.spline.harvester.listener.SplineQueryExecutionListener.onSuccess(SplineQueryExecutionListener.scala:40)
	at org.apache.spark.sql.util.ExecutionListenerBus.doPostEvent(QueryExecutionListener.scala:155)
	at org.apache.spark.sql.util.ExecutionListenerBus.doPostEvent(QueryExecutionListener.scala:131)
	at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
	at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
	at org.apache.spark.sql.util.ExecutionListenerBus.postToAll(QueryExecutionListener.scala:131)
	at org.apache.spark.sql.util.ExecutionListenerBus.onOtherEvent(QueryExecutionListener.scala:135)
	at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:84)
	at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
	at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
	at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
	at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
	at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
	at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:105)
	at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:105)
	at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
	at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:100)
	at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:96)
	at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1523)
	at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:96)

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:27 (16 by maintainers)

github_iconTop GitHub Comments

3reactions
GolamRashedcommented, Sep 6, 2021

@wajda Databricks team replied through Azure support that they have opened an internal ticket for the engineering team to make the Spline library work with DBR. I hope this is good news for the Spline Dev team, and I will keep you posted with updates from Databricks.

0reactions
wajdacommented, Aug 19, 2021

Thank you all guys, I’m closing the ticket as resioved then.

Just a request regarding the Spline UI. Is it possible to add an option to have each lineage capture event be named after the Table/file name that is written? Right now, it just says Databricks Shell.

Not sure I get it. It also shows a target destination, both a short name and full path. Anyway, that’s an unrelated topic. Please create another ticket for that. (for UI stuff there is a spline-ui repo)

Read more comments on GitHub >

github_iconTop Results From Across the Web

What Is Data Lineage? Why It's Important to Track Data Flow
It helps data scientists gain granular visibility of data dynamics and enables them to trace errors back to the root cause. Data lineage...
Read more >
The Ultimate Guide To Data Lineage - Monte Carlo Data
Data lineage is a must-have feature of the modern data stack, yet we're struggling to derive value from it. Here's why and how...
Read more >
Stitching Metadata for Tracing Data Lineage & Impact - LinkedIn
Importing and harvesting metadata is the very first step towards your journey to data governance in OEMM. You are using OEMM for several...
Read more >
The Light-Harvesting Chlorophyll a/b Binding Proteins Lhcb1 ...
Here, we show that despite their nearly identical amino acid composition, the functional roles of Lhcb1 and Lhcb2 are different but complementary. Arabidopsis ......
Read more >
Collibra-DIC-Data-Lineage-2022.09.pdf - My Document
Technical lineage is a detailed lineage graph that shows where data objects are used and how they are transformed.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found