Duplicated attribute IDs
See original GitHub issueUPDATE: A temporary workaround - https://github.com/absaoss/spline-spark-agent/issues/272#issuecomment-895947366
The issue was found in and causing AbsaOSS/spline#925
See JSON sample in https://github.com/AbsaOSS/spline/issues/925#issuecomment-874263960
minimal code to replicate the issue:
package za.co.absa.spline
import org.apache.spark.SparkConf
import org.apache.spark.sql.functions.col
import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession}
object SplineDuplicatedIds {
val extraConf: Iterable[(String, String)] = List(
("spark.sql.queryExecutionListeners", "za.co.absa.spline.harvester.listener.SplineQueryExecutionListener"),
("spark.spline.lineageDispatcher", "console"),
("spark.spline.mode", "REQUIRED")
)
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf()
.setAll(extraConf)
.setMaster("local[*]")
val ss = SparkSession.builder().config(sparkConf).getOrCreate()
import ss.implicits._
val df = Seq(("adcde1_12938597", 162))
.toDF("unique_id", "total_commission")
val result: DataFrame = df
.withColumn("commission", col("total_commission"))
.drop("total_commission")
val firstValidTransactions = result.withColumnRenamed("commission", "foo")
val joined = result.join(firstValidTransactions, usingColumns = Seq("unique_id"))
joined
.write
.mode(SaveMode.Overwrite)
.option("path", "tmp/spline_test_bi_duplicates")
.saveAsTable("spline_test_bi_duplicated")
}
}
Happens on Spark 3.1, 2.4 and probably all others as well
Issue Analytics
- State:
- Created 2 years ago
- Comments:21 (10 by maintainers)
Top Results From Across the Web
ID attribute values must be unique | Axe Rules | Deque Systems
Rename any duplicate ID attributes values. Duplicate IDs are common validation errors that may break the accessibility of labels, e.g., form fields, ...
Read more >duplicate-id - Accessibility Insights
Duplicate id values are a common, easily fixed validation error that can cause both scripting (such as JavaScript) and assistive technologies to behave ......
Read more >Avoid Duplicate id Attributes when Reusing Form Components
When we reuse components that rely on HTML id attributes, we run the risk of rendering HTML that results in duplicate IDs in...
Read more >Data QA: Identifying Duplicate Attribute Values
Follow these steps to learn how to identify duplicate attribute values. 1. Start FME Workbench and begin with an empty canvas. Select Readers...
Read more >Html duplicated ID - javascript - Stack Overflow
You absolutely should not have duplicate IDs. ... implement your component such that it takes a 'namespace' parameter/attribute when used, ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@cerveada thanks. I found the issue, I mean, I could solve it, but I do not understand why 😂 . At least I can work and hopefully give you guys a code that recreates the issue 🙌 .
@cerveada quick update from my side 😃 If I add localCheckpoint here
The duplicated ids are solved. Maybe this helps with your debugging 👍