.collect() call hangs indefinitely when spline lineage tracing is enabled
See original GitHub issueDescribe the bug
Calling .collect() on a DataSet obtained using jdbc hangs indefinitely. Same code works just fine when Spline lineage is turned off. It also works fine on spline 0.5.3, but not on Spline 0.5.5
Versions
Scala 2.11 Spark 2.4.6 Spline 0.5.5 (hang doesn’t occur on Spline 0.5.3)
Components State
- ArangoDB running without errors
- ArangoDB spline database initialized
- Rest Gateway running and
- connects to ArangoDB
- there are no errors in logs
- Spline UI running and
- connects to Rest Gateway consumer
- there are no errors in logs
To Reproduce
Steps to reproduce the behavior OR commands run:
- Create a relational table with say two columns (using Postgres below, but issue is on any db)
- Add some dummy rows
- Write some very basic Java Spark code to load the table into a DataSet and call .collect() on it
SparkSession spark = SparkSession
.builder()
.appName("Java Spark SQL basic example")
.master("local")
.getOrCreate();
SparkLineageInitializer.enableLineageTracking(spark);
String dbConnectionUrl = "jdbc:postgresql://postgresserver/spark_labs";
Properties prop = new Properties();
prop.setProperty("driver", "org.postgresql.Driver");
prop.setProperty("user", "****");
prop.setProperty("password", "****");
Dataset<Row> dsp = spark.read().jdbc(dbConnectionUrl, "<tablename>", prop);
Object rows = dsp.collect(); //Hangs here when Spline is turned on, works when Spline is turned off
Expected behaviour
Expecting the rows to be returned fairly quickly without hanging. While the example above is in Java, same hang occurs when similar code is written in Scala.
Screenshots
Desktop (please complete the following information):
- OS: [Windows Server 2016]
- Java 8
Additional context
Same example doesn’t hang in Spline 0.5.3
Issue Analytics
- State:
- Created 3 years ago
- Comments:10 (6 by maintainers)
Top Results From Across the Web
.collect() call hangs indefinitely when spline lineage tracing is ...
Calling .collect() on a DataSet obtained using jdbc hangs indefinitely. Same code works just fine when Spline lineage is turned off.
Read more >ArcGIS 10.2.1 Issues Addressed List - Esri Support
The Delete Selected button is grayed out in a DBF table in an editing session of ... ArcMap 10.1 crashes when ITraceTasks is...
Read more >Lineage Tracing in Humans Enabled by Mitochondrial ...
Lineage tracing provides key insights into the fate of individual cells in complex organisms. Although effective genetic labeling approaches ...
Read more >Spline: Central Data-Lineage Tracking, Not Only For Spark
Spline has started as a data- lineage tracking tool for Apache Spark. But now it offers a generic API and model that is...
Read more >Spline: Central Data-Lineage Tracking, Not Only For Spark
Spline has started as a data-lineage tracking tool for Apache Spark. But now it offers a generic API and model that is capable...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I verified the following in 0.5.6 agent (running against 0.5.5 spline rest server):
In log4j properties, with
log4j.logger.za.co.absa.spline.harvester=debug
no more hanging
With
log4j.logger.za.co.absa.spline.harvester=trace
continues to hang (as expected in 0.5.6)
So, basically, unless someone wants trace level in 0.5.6, the fix works. Thanks!
Here is the thread dump I see in jvisualvm (ObjectStructureDumper does seem to be involved as you suspect)