[QUESTION] about the implementation of spark merge into
See original GitHub issueHi, team, I have some doubt when use merge into
with spark sql
1.
eg:
merge into delete_error_test target
using (select 'wlq_new3' as name, 1 as id, 29 as age, '20210101' as dt) source
on target.id = source.id
when matched and target.age = 28 then update set age = source.age, dt = source.dt
when not matched then insert (id, age, name, dt)
values (source.id, source.age, source.name, '20210102')").explain(true)
the matched condition col is target table, it will throw cannot resolve, consider the below code, I’m not sure if it considers mor
table. If so, condition.map(resolveExpressionFrom(resolvedSource, target)(_))
for cow
may be ok.
Are there any other considerations?
val resolvedCondition = condition.map(resolveExpressionFrom(resolvedSource)(_))
// ...
private def resolveExpressionFrom(left: LogicalPlan, right: Option[LogicalPlan] = None)
(expression: Expression): Expression = {
// Fake a project for the expression based on the source plan.
val fakeProject = if (right.isDefined) {
Project(Seq(Alias(expression, "_c0")()),
sparkAdapter.createJoin(left, right.get, Inner))
} else {
Project(Seq(Alias(expression, "_c0")()),
left)
}
// Resolve the fake project
val resolvedProject =
analyzer.ResolveReferences.apply(fakeProject).asInstanceOf[Project]
val unResolvedAttrs = resolvedProject.projectList.head.collect {
case attr: UnresolvedAttribute => attr
}
if (unResolvedAttrs.nonEmpty) {
throw new AnalysisException(s"Cannot resolve ${unResolvedAttrs.mkString(",")} in " +
s"${expression.sql}, the input " + s"columns is: [${fakeProject.child.output.mkString(", ")}]")
}
// Fetch the resolved expression from the fake project.
resolvedProject.projectList.head.asInstanceOf[Alias].child
}
I have the need to use multi updateActions, Are there any considerations for the below limit?
assert(updateActions.size <= 1, s"Only support one updateAction currently, current update action count is: ${updateActions.size}")
Hope to get confused
Issue Analytics
- State:
- Created a year ago
- Comments:8 (8 by maintainers)
Top Results From Across the Web
Implementing MERGE INTO sql in pyspark - Stack Overflow
1 Answer 1 · Depends on data. Queries can be optimized by using bucketing or other approaches and with Spark we can take...
Read more >Incremental Merge with Apache Spark - phData
First, let's create the data as DataFrame and register them as SQL temporary views.
Read more >Top 80+ Apache Spark Interview Questions and Answers for ...
The Apache Spark interview questions have been divided into two parts: ... The shuffle operation is implemented differently in Spark compared to Hadoop....
Read more >Scenario Based | Merge DataFrame in Spark | LearntoSpark
In this video, we will learn how to merge two Dataframe in Spark using PySpark. we will discuss all the available approach to...
Read more >Spark Interview Questions Part-1 - big data programmers
Answer : When submitted in client mode the driver runs on the machine through which you have submitted the spark application. So when...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
https://issues.apache.org/jira/browse/HUDI-4361 create one for multiple update actions @yihua
@voonhous @fengjian428 Let’s take this implementation discussion to the JIRA HUDI-4361. We should prioritize this improvement in the next release (0.13.0). Since the issue itself has been triaged, I am going to close it.