question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[QUESTION] about the implementation of spark merge into

See original GitHub issue

Hi, team, I have some doubt when use merge into with spark sql 1. eg:

merge into delete_error_test target 
using (select 'wlq_new3' as name, 1 as id, 29 as age, '20210101' as dt) source 
on target.id = source.id
when matched and target.age = 28 then update set age = source.age, dt = source.dt 
when not matched  then insert (id, age, name, dt) 
values (source.id, source.age, source.name, '20210102')").explain(true)

the matched condition col is target table, it will throw cannot resolve, consider the below code, I’m not sure if it considers mor table. If so, condition.map(resolveExpressionFrom(resolvedSource, target)(_)) for cow may be ok. Are there any other considerations?

val resolvedCondition = condition.map(resolveExpressionFrom(resolvedSource)(_))
// ...
private def resolveExpressionFrom(left: LogicalPlan, right: Option[LogicalPlan] = None)
                        (expression: Expression): Expression = {
    // Fake a project for the expression based on the source plan.
    val fakeProject = if (right.isDefined) {
      Project(Seq(Alias(expression, "_c0")()),
        sparkAdapter.createJoin(left, right.get, Inner))
    } else {
      Project(Seq(Alias(expression, "_c0")()),
        left)
    }
    // Resolve the fake project
    val resolvedProject =
      analyzer.ResolveReferences.apply(fakeProject).asInstanceOf[Project]
    val unResolvedAttrs = resolvedProject.projectList.head.collect {
      case attr: UnresolvedAttribute => attr
    }
    if (unResolvedAttrs.nonEmpty) {
      throw new AnalysisException(s"Cannot resolve ${unResolvedAttrs.mkString(",")} in " +
        s"${expression.sql}, the input " + s"columns is: [${fakeProject.child.output.mkString(", ")}]")
    }
    // Fetch the resolved expression from the fake project.
    resolvedProject.projectList.head.asInstanceOf[Alias].child
  }

I have the need to use multi updateActions, Are there any considerations for the below limit?

assert(updateActions.size <= 1, s"Only support one updateAction currently, current update action count is: ${updateActions.size}")

Hope to get confused

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
fengjian428commented, Jul 5, 2022
updateActions

https://issues.apache.org/jira/browse/HUDI-4361 create one for multiple update actions @yihua

0reactions
codopecommented, Aug 2, 2022

Would like to enquire about some special cases. For multiple update actions, how would overlapping matched actions be evaluated?

For example:

when matched and target.age = -1 then update set name = source.name, age = source.age
when matched and target.age > -10 then update set name = source.name

I am not really sure how traditional SQL parsers/engines handle these situations.

@voonhous @fengjian428 Let’s take this implementation discussion to the JIRA HUDI-4361. We should prioritize this improvement in the next release (0.13.0). Since the issue itself has been triaged, I am going to close it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Implementing MERGE INTO sql in pyspark - Stack Overflow
1 Answer 1 · Depends on data. Queries can be optimized by using bucketing or other approaches and with Spark we can take...
Read more >
Incremental Merge with Apache Spark - phData
First, let's create the data as DataFrame and register them as SQL temporary views.
Read more >
Top 80+ Apache Spark Interview Questions and Answers for ...
The Apache Spark interview questions have been divided into two parts: ... The shuffle operation is implemented differently in Spark compared to Hadoop....
Read more >
Scenario Based | Merge DataFrame in Spark | LearntoSpark
In this video, we will learn how to merge two Dataframe in Spark using PySpark. we will discuss all the available approach to...
Read more >
Spark Interview Questions Part-1 - big data programmers
Answer : When submitted in client mode the driver runs on the machine through which you have submitted the spark application. So when...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found