question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values

See original GitHub issue

i’m loading data from DMS and i don’t want any partitions (i did not specify hoodie.datasource.hive_sync.partition_fields since website says can leave default empty)

/home/ec2-user/spark_home/bin/spark-submit --conf "spark.hadoop.fs.s3a.proxy.host=redact" --conf "spark.hadoop.fs.s3a.proxy.port=redact" --conf "spark.driver.extraClassPath=/home/ec2-user/json-20090211.jar" --conf "spark.executor.extraClassPath=/home/ec2-user/json-20090211.jar" --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer --jars "/home/ec2-user/spark-avro_2.11-2.4.6.jar" --master spark://redact:7077 --deploy-mode client /home/ec2-user/hudi-utilities-bundle_2.11-0.5.3-1.jar --table-type COPY_ON_WRITE --source-ordering-field TimeCreated --source-class org.apache.hudi.utilities.sources.ParquetDFSSource --enable-hive-sync --hoodie-conf hoodie.datasource.hive_sync.database=redact --hoodie-conf hoodie.datasource.hive_sync.table=dmstest_multpk4 --hoodie-conf hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.MultiPartKeysValueExtractor --hoodie-conf  hoodie.datasource.hive_sync.use_jdbc=false --target-base-path s3a://redact/my2/multpk4 --target-table dmstest_multpk4 --transformer-class org.apache.hudi.utilities.transform.AWSDmsTransformer --payload-class org.apache.hudi.payload.AWSDmsAvroPayload --hoodie-conf hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator --hoodie-conf hoodie.datasource.write.recordkey.field=version_no,group_company --hoodie-conf hoodie.datasource.write.partitionpath.field=sys_user --hoodie-conf hoodie.deltastreamer.source.dfs.root=s3a://redact/dbo/tblhere > multpk4.log
2020-08-12 11:31:11,186 [main] INFO  org.apache.hudi.client.AbstractHoodieWriteClient - Committed 20200812112840
2020-08-12 11:31:11,189 [main] INFO  org.apache.hudi.utilities.deltastreamer.DeltaSync - Commit 20200812112840 successful!
2020-08-12 11:31:11,194 [main] INFO  org.apache.hudi.utilities.deltastreamer.DeltaSync - Syncing target hoodie table with hive table(dmstest_multpk4). Hive metastore URL :jdbc:hive2://localhost:10000, basePath :s3a://redact/my2/multpk4
2020-08-12 11:31:11,960 [main] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[20200812112840__commit__COMPLETED]]
2020-08-12 11:31:14,264 [main] INFO  org.apache.hudi.hive.HiveSyncTool - Trying to sync hoodie table dmstest_multpk4 with base path s3a://redact/my2/multpk4 of type COPY_ON_WRITE
2020-08-12 11:31:14,707 [main] INFO  org.apache.hudi.hive.HoodieHiveClient - Reading schema from s3a://redact/my2/multpk4/mpark2/7ed7627c-6110-4d42-9df2-f3a6afe877df-0_187-25-15737_20200812112840.parquet
2020-08-12 11:31:15,330 [main] INFO  org.apache.hudi.hive.HiveSyncTool - Hive table dmstest_multpk4 is not found. Creating it
2020-08-12 11:31:15,337 [main] INFO  org.apache.hudi.hive.HoodieHiveClient - Creating table with CREATE EXTERNAL TABLE  IF NOT EXISTS `redact`.`dmstest_multpk4`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `Op` string, `Id` int, `AuditProcessHistoryId` int, `org_id` int, `org_name` string, `org_sname` string, `org_mnem` string, `org_parent` int, `percent_holding` double, `group_company` string, `grp_ord_for_cln` string, `mkt_only` string, `pro_rate_ind` string, `show_shapes` string, `sec_code_pref` string, `alert_org_ref` string, `swift_bic` string, `exec_breakdown` string, `notes` string, `active` string, `version_no` int, `sys_date` bigint, `sys_user` string, `create_date` bigint, `cntry_of_dom` string, `client` string, `alert_acronym` string, `oneoff_client` string, `booking_domicile` string, `booking_dom_list` string, `TimeCreated` bigint, `UserCreated` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3a://redact/my2/multpk4'
2020-08-12 11:31:15,411 [main] INFO  org.apache.hudi.hive.HoodieHiveClient - Time taken to start SessionState and create Driver: 74 ms
2020-08-12 11:31:15,444 [main] INFO  hive.ql.parse.ParseDriver - Parsing command: CREATE EXTERNAL TABLE  IF NOT EXISTS `redact`.`dmstest_multpk4`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `Op` string, `Id` int, `AuditProcessHistoryId` int, `org_id` int, `org_name` string, `org_sname` string, `org_mnem` string, `org_parent` int, `percent_holding` double, `group_company` string, `grp_ord_for_cln` string, `mkt_only` string, `pro_rate_ind` string, `show_shapes` string, `sec_code_pref` string, `alert_org_ref` string, `swift_bic` string, `exec_breakdown` string, `notes` string, `active` string, `version_no` int, `sys_date` bigint, `sys_user` string, `create_date` bigint, `cntry_of_dom` string, `client` string, `alert_acronym` string, `oneoff_client` string, `booking_domicile` string, `booking_dom_list` string, `TimeCreated` bigint, `UserCreated` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3a://redact/my2/multpk4'
2020-08-12 11:31:16,131 [main] INFO  hive.ql.parse.ParseDriver - Parse Completed
2020-08-12 11:31:16,568 [main] INFO  org.apache.hudi.hive.HoodieHiveClient - Time taken to execute [CREATE EXTERNAL TABLE  IF NOT EXISTS `redact`.`dmstest_multpk4`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `Op` string, `Id` int, `AuditProcessHistoryId` int, `org_id` int, `org_name` string, `org_sname` string, `org_mnem` string, `org_parent` int, `percent_holding` double, `group_company` string, `grp_ord_for_cln` string, `mkt_only` string, `pro_rate_ind` string, `show_shapes` string, `sec_code_pref` string, `alert_org_ref` string, `swift_bic` string, `exec_breakdown` string, `notes` string, `active` string, `version_no` int, `sys_date` bigint, `sys_user` string, `create_date` bigint, `cntry_of_dom` string, `client` string, `alert_acronym` string, `oneoff_client` string, `booking_domicile` string, `booking_dom_list` string, `TimeCreated` bigint, `UserCreated` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3a://redact/my2/multpk4']: 1157 ms
2020-08-12 11:31:16,574 [main] INFO  org.apache.hudi.hive.HiveSyncTool - Schema sync complete. Syncing partitions for dmstest_multpk4
2020-08-12 11:31:16,574 [main] INFO  org.apache.hudi.hive.HiveSyncTool - Last commit time synced was found to be null
2020-08-12 11:31:16,575 [main] INFO  org.apache.hudi.hive.HoodieHiveClient - Last commit time synced is not known, listing all partitions in s3a://redact/my2/multpk4,FS :S3AFileSystem{uri=s3a://redact, workingDir=s3a://redact/user/ec2-user, inputPolicy=normal, partSize=104857600, enableMultiObjectsDelete=true, maxKeys=5000, readAhead=65536, blockSize=33554432, multiPartThreshold=2147483647, serverSideEncryptionAlgorithm='AES256', blockFactory=org.apache.hadoop.fs.s3a.S3ADataBlocks$DiskBlockFactory@62765aec, boundedExecutor=BlockingThreadPoolExecutorService{SemaphoredDelegatingExecutor{permitCount=2405, available=2405, waiting=0}, activeCount=0}, unboundedExecutor=java.util.concurrent.ThreadPoolExecutor@6f5bd362[Running, pool size = 6, active threads = 0, queued tasks = 0, completed tasks = 6], statistics {761530 bytes read, 320081 bytes written, 712 read ops, 0 large read ops, 31 write ops}, metrics {{Context=S3AFileSystem} {FileSystemId=db54a51b-e05e-4b3c-9140-240762a0c03d-redact} {fsURI=s3a://redact/redact/sparkevents} {files_created=5} {files_copied=0} {files_copied_bytes=0} {files_deleted=271} {fake_directories_deleted=0} {directories_created=6} {directories_deleted=0} {ignored_errors=4} {op_copy_from_local_file=0} {op_exists=53} {op_get_file_status=415} {op_glob_status=0} {op_is_directory=38} {op_is_file=0} {op_list_files=271} {op_list_located_status=0} {op_list_status=19} {op_mkdirs=5} {op_rename=0} {object_copy_requests=0} {object_delete_requests=5} {object_list_requests=680} {object_continue_list_requests=0} {object_metadata_requests=805} {object_multipart_aborted=0} {object_put_bytes=320081} {object_put_requests=10} {object_put_requests_completed=10} {stream_write_failures=0} {stream_write_block_uploads=0} {stream_write_block_uploads_committed=0} {stream_write_block_uploads_aborted=0} {stream_write_total_time=0} {stream_write_total_data=320081} {object_put_requests_active=0} {object_put_bytes_pending=0} {stream_write_block_uploads_active=0} {stream_write_block_uploads_pending=4} {stream_write_block_uploads_data_pending=0} {stream_read_fully_operations=0} {stream_opened=22} {stream_bytes_skipped_on_seek=0} {stream_closed=22} {stream_bytes_backwards_on_seek=437965} {stream_bytes_read=761530} {stream_read_operations_incomplete=107} {stream_bytes_discarded_in_abort=0} {stream_close_operations=22} {stream_read_operations=3020} {stream_aborted=0} {stream_forward_seek_operations=0} {stream_backward_seek_operations=1} {stream_seek_operations=1} {stream_bytes_read_in_close=8} {stream_read_exceptions=0} }}
2020-08-12 11:31:34,438 [main] INFO  org.apache.hudi.hive.HiveSyncTool - Storage partitions scan complete. Found 271
2020-08-12 11:31:34,476 [main] INFO  org.apache.hudi.hive.HiveSyncTool - New Partitions [AAB, redactlist]
2020-08-12 11:31:34,476 [main] INFO  org.apache.hudi.hive.HoodieHiveClient - Adding partitions 271 to table dmstest_multpk4
2020-08-12 11:31:34,477 [main] ERROR org.apache.hudi.hive.HiveSyncTool - Got runtime exception when hive syncing
org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync partitions for table dmstest_multpk4
        at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:187)
        at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:126)
        at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:87)
        at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncHive(DeltaSync.java:460)
        at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:402)
        at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:235)
        at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:123)
        at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:380)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values [AAB]. Check partition strategy.
        at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40)
        at org.apache.hudi.hive.HoodieHiveClient.getPartitionClause(HoodieHiveClient.java:182)
        at org.apache.hudi.hive.HoodieHiveClient.constructAddPartitions(HoodieHiveClient.java:166)
        at org.apache.hudi.hive.HoodieHiveClient.addPartitionsToTable(HoodieHiveClient.java:141)
        at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:182)
        ... 19 more
2020-08-12 11:31:34,513 [main] INFO  org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer - Shut down deltastreamer
2020-08-12 11:31:34,535 [main] INFO  org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend - Shutting down all executors
aws s3 ls s3://redact/my2/multpk4/
                           PRE .hoodie/
                           PRE AAB/
                           PRE CC/
                           PRE DD/
                           ...etc

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:9 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
satishkothacommented, Aug 25, 2020

If a single column as key works for you, you can also try

hoodie.datasource.write.keygenerator.class=com.uber.hoodie.NonpartitionedKeyGenerator hoodie.datasource.hive_sync.partition_extractor_class=com.uber.hoodie.hive.NonPartitionedExtractor hoodie.datasource.write.recordkey.field=(new column that is unique)

0reactions
bvaradarcommented, Sep 14, 2020

Closing this issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[GitHub] [hudi] satishkotha commented on issue #1954 ...
... commented on issue #1954: [SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values.
Read more >
Specified partition columns do not match the ... - Stack Overflow
I was getting the same error while writing a dataframe to a delta table. Turns out, I had earlier created the table without...
Read more >
[#HUDI-628] MultiPartKeysValueExtractor does not work with ...
IllegalArgumentException : Partition key parts [partitionpath] does not match with partition values [americas, brazil, sao_paulo].
Read more >
2 ADF_FACES-00001 to ADF_FACES-60151
Cause : The client attempted to write to a component property that is only writable by the server. Action: Contact Oracle Support Services....
Read more >
SQL Syntax Reference
No part of this document may be reproduced or transmitted in any form ... 1.9.6 Updating Partitioned Table Data (Only OBS Tables Supported)....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found