Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Improve SQL syntax for MERGE

See original GitHub issue

Existing SQL syntax (see here and here) for MEGRE could be improved by adding an alternative for ON <merge_condition>

Main assumption In common cases target and source tables have the same column names used in <merge_condition> as keys. For example ON target.id = source.id or ON target.name = source.name AND target.surname = source.surname It would be more convenient to use ON COLUMNS (id) and ON COLUMNS (name, surname) or similar instead. The same approach is used for JOIN where join_criteria syntax is ON boolean_expression | USING ( column_name [ , ... ] )

Improvement proposal Syntax

MERGE INTO target_table_identifier [AS target_alias]
USING source_table_identifier [<time_travel_version>] [AS source_alias]
ON { <merge_condition> | COLUMNS ( column_name [ , ... ] ) }
[ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]
[ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]
[ WHEN NOT MATCHED [ AND <condition> ]  THEN <not_matched_action> ]

Example

MERGE INTO target
USING source
ON COLUMNS (name, surname)
WHEN MATCHED THEN
    UPDATE SET *
WHEN NOT MATCHED THEN
    INSERT *

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

tdascommented, Aug 10, 2021

I doubt this syntax improvement will be supported by Spark. MERGE has a clear SQL standard, and I am not sure if something like USING is supported in that. We have been able to convince Spark to support UPDATE SET * and INSERT * (completely outside the SQL standard) only because of the major usability improvement that it gives. But for something smaller like this, I doubt we will be able to convince Spark community to adopt this non-standard approach.

0reactions

dnskrcommented, Aug 24, 2021

Closing in favor of the ticket created in Spark JIRA https://issues.apache.org/jira/browse/SPARK-36472

Top Results From Across the Web

Ways to improve the performance of a SQL MERGE statement

Ways to improve the performance of a SQL MERGE statement · 1. Create indexes: Ensure that the columns referenced in the condition are...

How to optimize SQL Server Merge statement running with ...

1 Answer 1 · Create an index on the join columns in the source table that is unique and covering. · Create a...

Use MERGE to Update 1 Million Rows in 2 Seconds - Vertica

SQL MERGE statements combine INSERT and UPDATE operations. They are a great way to update by inserting a small (<1000), or large (>1...

Understanding the SQL MERGE statement - SQLShack

The MERGE statement in SQL is a very popular clause that can handle inserts, updates, and deletes all in a single transaction without...

MERGE (Transact-SQL) - SQL Server - Microsoft Learn

This can improve query performance because the operations are performed within a single statement, therefore, minimizing the number of times the ...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

[PySpark 3.2.0] (java.lang.IncompatibleClassChangeError: class org.apache.spark.sql.catalyst.plans.logical.DeltaDelete has interface org.apache.spark.sql.catalyst.plans.logical.UnaryNode as super class)

Improve SQL syntax for MERGE

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

[PySpark 3.2.0] (java.lang.IncompatibleClassChangeError: class org.apache.spark.sql.catalyst.plans.logical.DeltaDelete has interface org.apache.spark.sql.catalyst.plans.logical.UnaryNode as super class)

[Feature Request] Subqueries are not supported in the DELETE where predicate