Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[QUESTION] about the implementation of spark merge into

See original GitHub issue

Hi, team, I have some doubt when use merge into with spark sql 1. eg:

merge into delete_error_test target 
using (select 'wlq_new3' as name, 1 as id, 29 as age, '20210101' as dt) source 
on target.id = source.id
when matched and target.age = 28 then update set age = source.age, dt = source.dt 
when not matched  then insert (id, age, name, dt) 
values (source.id, source.age, source.name, '20210102')").explain(true)

the matched condition col is target table, it will throw cannot resolve, consider the below code, I’m not sure if it considers mor table. If so, condition.map(resolveExpressionFrom(resolvedSource, target)(_)) for cow may be ok. Are there any other considerations?

val resolvedCondition = condition.map(resolveExpressionFrom(resolvedSource)(_))
// ...
private def resolveExpressionFrom(left: LogicalPlan, right: Option[LogicalPlan] = None)
                        (expression: Expression): Expression = {
    // Fake a project for the expression based on the source plan.
    val fakeProject = if (right.isDefined) {
      Project(Seq(Alias(expression, "_c0")()),
        sparkAdapter.createJoin(left, right.get, Inner))
    } else {
      Project(Seq(Alias(expression, "_c0")()),
        left)
    }
    // Resolve the fake project
    val resolvedProject =
      analyzer.ResolveReferences.apply(fakeProject).asInstanceOf[Project]
    val unResolvedAttrs = resolvedProject.projectList.head.collect {
      case attr: UnresolvedAttribute => attr
    }
    if (unResolvedAttrs.nonEmpty) {
      throw new AnalysisException(s"Cannot resolve ${unResolvedAttrs.mkString(",")} in " +
        s"${expression.sql}, the input " + s"columns is: [${fakeProject.child.output.mkString(", ")}]")
    }
    // Fetch the resolved expression from the fake project.
    resolvedProject.projectList.head.asInstanceOf[Alias].child
  }

I have the need to use multi updateActions, Are there any considerations for the below limit？

assert(updateActions.size <= 1, s"Only support one updateAction currently, current update action count is: ${updateActions.size}")

Hope to get confused

Issue Analytics

State:
Created a year ago
Comments:8 (8 by maintainers)

Top GitHub Comments

1reaction

fengjian428commented, Jul 5, 2022

updateActions

https://issues.apache.org/jira/browse/HUDI-4361 create one for multiple update actions @yihua

0reactions

codopecommented, Aug 2, 2022

Would like to enquire about some special cases. For multiple update actions, how would overlapping matched actions be evaluated?

For example:
when matched and target.age = -1 then update set name = source.name, age = source.age
when matched and target.age > -10 then update set name = source.name
I am not really sure how traditional SQL parsers/engines handle these situations.

@voonhous @fengjian428 Let’s take this implementation discussion to the JIRA HUDI-4361. We should prioritize this improvement in the next release (0.13.0). Since the issue itself has been triaged, I am going to close it.

Top Results From Across the Web

Implementing MERGE INTO sql in pyspark - Stack Overflow

1 Answer 1 · Depends on data. Queries can be optimized by using bucketing or other approaches and with Spark we can take...

Incremental Merge with Apache Spark - phData

First, let's create the data as DataFrame and register them as SQL temporary views.

Top 80+ Apache Spark Interview Questions and Answers for ...

The Apache Spark interview questions have been divided into two parts: ... The shuffle operation is implemented differently in Spark compared to Hadoop....

Scenario Based | Merge DataFrame in Spark | LearntoSpark

In this video, we will learn how to merge two Dataframe in Spark using PySpark. we will discuss all the available approach to...

Spark Interview Questions Part-1 - big data programmers

Answer : When submitted in client mode the driver runs on the machine through which you have submitted the spark application. So when...