Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

spark sql MERGE INTO There is an error Error: Error running query: java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of partitions: List(1, 0) (state=,code=0)

See original GitHub issue

CREATE TABLE IF NOT EXISTS cdp.test_merge_001(offline_channel STRING COMMENT '_pk',unique_key STRING COMMENT '_ck',open_id STRING COMMENT '',mobile string COMMENT '_ck',hobby STRING COMMENT '',activity_time STRING COMMENT '' )
 USING iceberg;

CREATE TABLE IF NOT EXISTS cdp.test_merge_002(offline_channel STRING COMMENT '_pk',unique_key STRING COMMENT '_ck',open_id STRING COMMENT '',mobile string COMMENT '_ck',hobby STRING COMMENT '',activity_time STRING COMMENT '' )
 USING iceberg；

this is table test_merge_002:

this is table test_merge_001:

but run this sql，appear error

MERGE INTO cdp.test_merge_002
 tt1  USING 
(SELECT * FROM cdp.test_merge_001) tt2 ON (  tt1.unique_key = tt2.unique_key AND tt1.mobile = tt2.mobile) WHEN MATCHED THEN UPDATE SET tt1.offline_channel = tt2.offline_channel,
tt1.unique_key = tt2.unique_key,
tt1.open_id = tt2.open_id,
tt1.mobile = tt2.mobile,
tt1.hobby = tt2.hobby,
tt1.activity_time = tt2.activity_time  WHEN NOT MATCHED THEN INSERT *

Issue Analytics

State:
Created 2 years ago
Comments:21 (15 by maintainers)

Top GitHub Comments

1reaction

RussellSpitzercommented, May 8, 2021

I think this rule is fixing it for Spark

https://github.com/apache/spark/blob/8f0fef18438aa8fb07f5ed885ffad1339992f102/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L128

0reactions

KarlManongcommented, Jul 15, 2021

How were you running on that build? My first guess would be that that version wasn’t actually present at runtime

@RussellSpitzer I rebuilt a table with exactly the same statement(using Trino), and everything worked well. The only difference is that the old table has some data.

The failed logs: s-bigdata-402-5918f407-84557c7aa8cf66dc-driver-spark-kubernetes-driver-log.txt

The succeed logs: s-bigdata-402-21809172-57074b7aa8d5269c-driver-spark-kubernetes-driver-log.txt

The sql: create table.txt merge.txt

I run the exception sql on spark-thriftserver, and it worked. May be the old application has some problem.

Top Results From Across the Web

[GitHub] [iceberg] KarlManong commented on issue #2533: spark ...

... sql MERGE INTO There is an error Error: Error running query: java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of partitions: ...

Can't zip RDDs with unequal numbers of partitions: List(2, 1 ...

It is a bug in AQE, clearly, for the version of Spark you are running. Set AQE out. zip works with RDD partitions...

Can't zip RDDs with unequal numbers of partitions ... - Re

(See it here - http://pastebin.dqd.cz/RAhm/) After I've increased spark.sql.autoBroadcastJoinThreshold to 300000 from 100000 it went through ...

Solving 5 Mysterious Spark Errors | by yhoztak - Medium

This error usually happens when two dataframes, and you apply udf on some columns to transfer, aggregate, rejoining to add as new fields...

MNIST example cannot run because of RDD.zip() #100 - GitHub

The problem is that the zip operation assumes that the number of partitions AND the number of elements within each partition will be...