ElasticSearch :: No lineage is captured
See original GitHub issueEnvironment Apache Tomcat 9.0.58 Delta Lake 1.0.0 Elasticsearch-7.13.4 hadoop-3.2.2 Java-11.0.10 spark-3.1.2-bin-hadoop3.2 spark-3.1-spline-agent-bundle_2.12-0.7.3 spline-web-ui-0.7.3 spline-rest-server-0.7.5
Logs stderr_from_yarn.log localhost_access_log.2022-02-21.txt
Content of spark-defaults.conf
spark.master spark://h8:7077,h5:7077
spark.eventLog.enabled true
spark.eventLog.dir hdfs://masters/spark/eventLog
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.delta.logStore.class org.apache.spark.sql.delta.storage.HDFSLogStore
spark.executor.extraClassPath /home/hadoop/SW/extra-libs/*
spark.driver.extraClassPath /home/hadoop/SW/extra-libs/*
spark.hive.metastore.uris thrift://192.168.21.8:9083
spark.sql.warehouse.dir hdfs://masters/
spark.sql.queryExecutionListeners za.co.absa.spline.harvester.listener.SplineQueryExecutionListener
spark.spline.producer.url http://h8:9090/spline-rest/producer
Spline init type codeless
Question Dear team, I just ran into a very tricky problem. I was running Spline as Java application. Spline spark agent was initialized successfully. I created a task which reads data from HDFS stored in format delta lake and write the data into Elasticsearch and submitted it to yarn. Everything looked good so far. Spline initialized successfully and data was written into Elasticsearch. However no lineage data was shown on spline web. Until now I haven’t been able to figure it out. Could you please kindly help me find out the what’s wrong and what I missed? @cerveada @wajda
Issue Analytics
- State:
- Created 2 years ago
- Comments:20 (10 by maintainers)
Top GitHub Comments
Dear @wajda, I found I made a extremely stupid mistake. After plenty of experiments, I finally figured out what the real problem is. First, elasticsearch-hadoop might not be compatible due to reason about spark or scala version. I replaced it with elasticsearch-spark-30_2.12-7.13.4. And here comes the most important part, I forgot to place the jar under
spark.driver.extraClassPath
andspark.executor.extraClassPath
. That’s why ElasticsearchPlugin didn’t recognize the output source. Sorry for my stupid mistake. I think this issue could be closed now. And thank you again for your support.Hi @wajda, I’ve been working on this. But I’m not sure if I can make it. Here is one information I can share that embedded-elasticsearch is no longer maintained and ES 7.X is not supported. Testcontainer is recommended by their github. I’m stilling learning how to use it.