question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Duplicated attribute IDs

See original GitHub issue

Issue Description

UPDATE: A temporary workaround - https://github.com/absaoss/spline-spark-agent/issues/272#issuecomment-895947366

The issue was found in and causing AbsaOSS/spline#925

See JSON sample in https://github.com/AbsaOSS/spline/issues/925#issuecomment-874263960

minimal code to replicate the issue:

package za.co.absa.spline

import org.apache.spark.SparkConf
import org.apache.spark.sql.functions.col
import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession}

object SplineDuplicatedIds {

  val extraConf: Iterable[(String, String)] = List(
    ("spark.sql.queryExecutionListeners", "za.co.absa.spline.harvester.listener.SplineQueryExecutionListener"),
    ("spark.spline.lineageDispatcher", "console"),
    ("spark.spline.mode", "REQUIRED")
  )

  def main(args: Array[String]): Unit = {

    val sparkConf = new SparkConf()
      .setAll(extraConf)
      .setMaster("local[*]")

    val ss = SparkSession.builder().config(sparkConf).getOrCreate()
    import ss.implicits._

    val df = Seq(("adcde1_12938597", 162))
      .toDF("unique_id", "total_commission")

    val result: DataFrame = df
      .withColumn("commission", col("total_commission"))
      .drop("total_commission")

    val firstValidTransactions = result.withColumnRenamed("commission", "foo")

    val joined = result.join(firstValidTransactions, usingColumns = Seq("unique_id"))

    joined
      .write
      .mode(SaveMode.Overwrite)
      .option("path", "tmp/spline_test_bi_duplicates")
      .saveAsTable("spline_test_bi_duplicated")
  }
}

Happens on Spark 3.1, 2.4 and probably all others as well

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:21 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
kuhnencommented, Aug 9, 2021

@cerveada thanks. I found the issue, I mean, I could solve it, but I do not understand why 😂 . At least I can work and hopefully give you guys a code that recreates the issue 🙌 .

0reactions
kuhnencommented, Aug 10, 2021

@cerveada quick update from my side 😃 If I add localCheckpoint here

  val firstValidTransactions = df
        .select(uniqueIdColName, transactionStatusColName, createdColName)
        .where(col(transactionStatusColName).isInCollection(Set("Paid", "Validated")))
        .withColumnRenamed(createdColName, firstValidStatusDateColName)
        .drop(transactionStatusColName)
       .localCheckPoint()

The duplicated ids are solved. Maybe this helps with your debugging 👍

Read more comments on GitHub >

github_iconTop Results From Across the Web

ID attribute values must be unique | Axe Rules | Deque Systems
Rename any duplicate ID attributes values. Duplicate IDs are common validation errors that may break the accessibility of labels, e.g., form fields, ...
Read more >
duplicate-id - Accessibility Insights
Duplicate id values are a common, easily fixed validation error that can cause both scripting (such as JavaScript) and assistive technologies to behave ......
Read more >
Avoid Duplicate id Attributes when Reusing Form Components
When we reuse components that rely on HTML id attributes, we run the risk of rendering HTML that results in duplicate IDs in...
Read more >
Data QA: Identifying Duplicate Attribute Values
Follow these steps to learn how to identify duplicate attribute values. 1. Start FME Workbench and begin with an empty canvas. Select Readers...
Read more >
Html duplicated ID - javascript - Stack Overflow
You absolutely should not have duplicate IDs. ... implement your component such that it takes a 'namespace' parameter/attribute when used, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found