question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Improve SQL syntax for MERGE

See original GitHub issue

Existing SQL syntax (see here and here) for MEGRE could be improved by adding an alternative for ON <merge_condition>

Main assumption In common cases target and source tables have the same column names used in <merge_condition> as keys. For example ON target.id = source.id or ON target.name = source.name AND target.surname = source.surname It would be more convenient to use ON COLUMNS (id) and ON COLUMNS (name, surname) or similar instead. The same approach is used for JOIN where join_criteria syntax is ON boolean_expression | USING ( column_name [ , ... ] )

Improvement proposal Syntax

MERGE INTO target_table_identifier [AS target_alias]
USING source_table_identifier [<time_travel_version>] [AS source_alias]
ON { <merge_condition> | COLUMNS ( column_name [ , ... ] ) }
[ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]
[ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]
[ WHEN NOT MATCHED [ AND <condition> ]  THEN <not_matched_action> ]

Example

MERGE INTO target
USING source
ON COLUMNS (name, surname)
WHEN MATCHED THEN
    UPDATE SET *
WHEN NOT MATCHED THEN
    INSERT *

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
tdascommented, Aug 10, 2021

I doubt this syntax improvement will be supported by Spark. MERGE has a clear SQL standard, and I am not sure if something like USING is supported in that. We have been able to convince Spark to support UPDATE SET * and INSERT * (completely outside the SQL standard) only because of the major usability improvement that it gives. But for something smaller like this, I doubt we will be able to convince Spark community to adopt this non-standard approach.

0reactions
dnskrcommented, Aug 24, 2021

Closing in favor of the ticket created in Spark JIRA https://issues.apache.org/jira/browse/SPARK-36472

Read more comments on GitHub >

github_iconTop Results From Across the Web

Ways to improve the performance of a SQL MERGE statement
Ways to improve the performance of a SQL MERGE statement · 1. Create indexes: Ensure that the columns referenced in the condition are...
Read more >
How to optimize SQL Server Merge statement running with ...
1 Answer 1 · Create an index on the join columns in the source table that is unique and covering. · Create a...
Read more >
Use MERGE to Update 1 Million Rows in 2 Seconds - Vertica
SQL MERGE statements combine INSERT and UPDATE operations. They are a great way to update by inserting a small (<1000), or large (>1...
Read more >
Understanding the SQL MERGE statement - SQLShack
The MERGE statement in SQL is a very popular clause that can handle inserts, updates, and deletes all in a single transaction without...
Read more >
MERGE (Transact-SQL) - SQL Server - Microsoft Learn
This can improve query performance because the operations are performed within a single statement, therefore, minimizing the number of times the ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found