question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ElasticSearch :: No lineage is captured

See original GitHub issue

Environment Apache Tomcat 9.0.58 Delta Lake 1.0.0 Elasticsearch-7.13.4 hadoop-3.2.2 Java-11.0.10 spark-3.1.2-bin-hadoop3.2 spark-3.1-spline-agent-bundle_2.12-0.7.3 spline-web-ui-0.7.3 spline-rest-server-0.7.5

Logs stderr_from_yarn.log localhost_access_log.2022-02-21.txt

Content of spark-defaults.conf

spark.master                                        spark://h8:7077,h5:7077
spark.eventLog.enabled                              true
spark.eventLog.dir                                  hdfs://masters/spark/eventLog
spark.serializer                                    org.apache.spark.serializer.KryoSerializer
spark.delta.logStore.class                          org.apache.spark.sql.delta.storage.HDFSLogStore
spark.executor.extraClassPath                       /home/hadoop/SW/extra-libs/*
spark.driver.extraClassPath                         /home/hadoop/SW/extra-libs/*
spark.hive.metastore.uris                           thrift://192.168.21.8:9083
spark.sql.warehouse.dir                             hdfs://masters/
spark.sql.queryExecutionListeners                   za.co.absa.spline.harvester.listener.SplineQueryExecutionListener
spark.spline.producer.url                           http://h8:9090/spline-rest/producer

Spline init type codeless

Question Dear team, I just ran into a very tricky problem. I was running Spline as Java application. Spline spark agent was initialized successfully. I created a task which reads data from HDFS stored in format delta lake and write the data into Elasticsearch and submitted it to yarn. Everything looked good so far. Spline initialized successfully and data was written into Elasticsearch. However no lineage data was shown on spline web. Until now I haven’t been able to figure it out. Could you please kindly help me find out the what’s wrong and what I missed? @cerveada @wajda

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:20 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
Taurus-Lecommented, Mar 10, 2022

Dear @wajda, I found I made a extremely stupid mistake. After plenty of experiments, I finally figured out what the real problem is. First, elasticsearch-hadoop might not be compatible due to reason about spark or scala version. I replaced it with elasticsearch-spark-30_2.12-7.13.4. And here comes the most important part, I forgot to place the jar under spark.driver.extraClassPath and spark.executor.extraClassPath. That’s why ElasticsearchPlugin didn’t recognize the output source. Sorry for my stupid mistake. I think this issue could be closed now. And thank you again for your support.

1reaction
Taurus-Lecommented, Mar 2, 2022

Hi @wajda, I’ve been working on this. But I’m not sure if I can make it. Here is one information I can share that embedded-elasticsearch is no longer maintained and ES 7.X is not supported. Testcontainer is recommended by their github. I’m stilling learning how to use it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Lineage capture - Hackolade
3.10, Hackolade introduced some basic lineage capture capabilities. To be clear, Hackolade is not a data governance suite with fancy lineage visualization and ......
Read more >
Building End-to-End Field Level Lineage for Modern Data ...
In this article, authors discuss the data lineage as a critical component of data pipeline root cause and impact analysis workflow, ...
Read more >
RDD lineage support · Issue #33 · AbsaOSS/spline-spark-agent
Currently there is no one easy solution to provide lineage for RDDs, but there are several ... ElasticSearch :: No lineage is captured...
Read more >
Lineage Ingestion - OpenMetadata Docs
Manual Lineage​​ Sometimes there is information that is shared among people but not present in the sources. To enable capturing all the possible...
Read more >
Profile API | Elasticsearch Guide [8.5] | Elastic
The Profile API gives the user insight into how search requests are executed at a low level so that the user can understand...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found