[SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values
See original GitHub issuei’m loading data from DMS and i don’t want any partitions (i did not specify hoodie.datasource.hive_sync.partition_fields since website says can leave default empty)
/home/ec2-user/spark_home/bin/spark-submit --conf "spark.hadoop.fs.s3a.proxy.host=redact" --conf "spark.hadoop.fs.s3a.proxy.port=redact" --conf "spark.driver.extraClassPath=/home/ec2-user/json-20090211.jar" --conf "spark.executor.extraClassPath=/home/ec2-user/json-20090211.jar" --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer --jars "/home/ec2-user/spark-avro_2.11-2.4.6.jar" --master spark://redact:7077 --deploy-mode client /home/ec2-user/hudi-utilities-bundle_2.11-0.5.3-1.jar --table-type COPY_ON_WRITE --source-ordering-field TimeCreated --source-class org.apache.hudi.utilities.sources.ParquetDFSSource --enable-hive-sync --hoodie-conf hoodie.datasource.hive_sync.database=redact --hoodie-conf hoodie.datasource.hive_sync.table=dmstest_multpk4 --hoodie-conf hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.MultiPartKeysValueExtractor --hoodie-conf hoodie.datasource.hive_sync.use_jdbc=false --target-base-path s3a://redact/my2/multpk4 --target-table dmstest_multpk4 --transformer-class org.apache.hudi.utilities.transform.AWSDmsTransformer --payload-class org.apache.hudi.payload.AWSDmsAvroPayload --hoodie-conf hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator --hoodie-conf hoodie.datasource.write.recordkey.field=version_no,group_company --hoodie-conf hoodie.datasource.write.partitionpath.field=sys_user --hoodie-conf hoodie.deltastreamer.source.dfs.root=s3a://redact/dbo/tblhere > multpk4.log
2020-08-12 11:31:11,186 [main] INFO org.apache.hudi.client.AbstractHoodieWriteClient - Committed 20200812112840
2020-08-12 11:31:11,189 [main] INFO org.apache.hudi.utilities.deltastreamer.DeltaSync - Commit 20200812112840 successful!
2020-08-12 11:31:11,194 [main] INFO org.apache.hudi.utilities.deltastreamer.DeltaSync - Syncing target hoodie table with hive table(dmstest_multpk4). Hive metastore URL :jdbc:hive2://localhost:10000, basePath :s3a://redact/my2/multpk4
2020-08-12 11:31:11,960 [main] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[20200812112840__commit__COMPLETED]]
2020-08-12 11:31:14,264 [main] INFO org.apache.hudi.hive.HiveSyncTool - Trying to sync hoodie table dmstest_multpk4 with base path s3a://redact/my2/multpk4 of type COPY_ON_WRITE
2020-08-12 11:31:14,707 [main] INFO org.apache.hudi.hive.HoodieHiveClient - Reading schema from s3a://redact/my2/multpk4/mpark2/7ed7627c-6110-4d42-9df2-f3a6afe877df-0_187-25-15737_20200812112840.parquet
2020-08-12 11:31:15,330 [main] INFO org.apache.hudi.hive.HiveSyncTool - Hive table dmstest_multpk4 is not found. Creating it
2020-08-12 11:31:15,337 [main] INFO org.apache.hudi.hive.HoodieHiveClient - Creating table with CREATE EXTERNAL TABLE IF NOT EXISTS `redact`.`dmstest_multpk4`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `Op` string, `Id` int, `AuditProcessHistoryId` int, `org_id` int, `org_name` string, `org_sname` string, `org_mnem` string, `org_parent` int, `percent_holding` double, `group_company` string, `grp_ord_for_cln` string, `mkt_only` string, `pro_rate_ind` string, `show_shapes` string, `sec_code_pref` string, `alert_org_ref` string, `swift_bic` string, `exec_breakdown` string, `notes` string, `active` string, `version_no` int, `sys_date` bigint, `sys_user` string, `create_date` bigint, `cntry_of_dom` string, `client` string, `alert_acronym` string, `oneoff_client` string, `booking_domicile` string, `booking_dom_list` string, `TimeCreated` bigint, `UserCreated` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3a://redact/my2/multpk4'
2020-08-12 11:31:15,411 [main] INFO org.apache.hudi.hive.HoodieHiveClient - Time taken to start SessionState and create Driver: 74 ms
2020-08-12 11:31:15,444 [main] INFO hive.ql.parse.ParseDriver - Parsing command: CREATE EXTERNAL TABLE IF NOT EXISTS `redact`.`dmstest_multpk4`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `Op` string, `Id` int, `AuditProcessHistoryId` int, `org_id` int, `org_name` string, `org_sname` string, `org_mnem` string, `org_parent` int, `percent_holding` double, `group_company` string, `grp_ord_for_cln` string, `mkt_only` string, `pro_rate_ind` string, `show_shapes` string, `sec_code_pref` string, `alert_org_ref` string, `swift_bic` string, `exec_breakdown` string, `notes` string, `active` string, `version_no` int, `sys_date` bigint, `sys_user` string, `create_date` bigint, `cntry_of_dom` string, `client` string, `alert_acronym` string, `oneoff_client` string, `booking_domicile` string, `booking_dom_list` string, `TimeCreated` bigint, `UserCreated` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3a://redact/my2/multpk4'
2020-08-12 11:31:16,131 [main] INFO hive.ql.parse.ParseDriver - Parse Completed
2020-08-12 11:31:16,568 [main] INFO org.apache.hudi.hive.HoodieHiveClient - Time taken to execute [CREATE EXTERNAL TABLE IF NOT EXISTS `redact`.`dmstest_multpk4`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `Op` string, `Id` int, `AuditProcessHistoryId` int, `org_id` int, `org_name` string, `org_sname` string, `org_mnem` string, `org_parent` int, `percent_holding` double, `group_company` string, `grp_ord_for_cln` string, `mkt_only` string, `pro_rate_ind` string, `show_shapes` string, `sec_code_pref` string, `alert_org_ref` string, `swift_bic` string, `exec_breakdown` string, `notes` string, `active` string, `version_no` int, `sys_date` bigint, `sys_user` string, `create_date` bigint, `cntry_of_dom` string, `client` string, `alert_acronym` string, `oneoff_client` string, `booking_domicile` string, `booking_dom_list` string, `TimeCreated` bigint, `UserCreated` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3a://redact/my2/multpk4']: 1157 ms
2020-08-12 11:31:16,574 [main] INFO org.apache.hudi.hive.HiveSyncTool - Schema sync complete. Syncing partitions for dmstest_multpk4
2020-08-12 11:31:16,574 [main] INFO org.apache.hudi.hive.HiveSyncTool - Last commit time synced was found to be null
2020-08-12 11:31:16,575 [main] INFO org.apache.hudi.hive.HoodieHiveClient - Last commit time synced is not known, listing all partitions in s3a://redact/my2/multpk4,FS :S3AFileSystem{uri=s3a://redact, workingDir=s3a://redact/user/ec2-user, inputPolicy=normal, partSize=104857600, enableMultiObjectsDelete=true, maxKeys=5000, readAhead=65536, blockSize=33554432, multiPartThreshold=2147483647, serverSideEncryptionAlgorithm='AES256', blockFactory=org.apache.hadoop.fs.s3a.S3ADataBlocks$DiskBlockFactory@62765aec, boundedExecutor=BlockingThreadPoolExecutorService{SemaphoredDelegatingExecutor{permitCount=2405, available=2405, waiting=0}, activeCount=0}, unboundedExecutor=java.util.concurrent.ThreadPoolExecutor@6f5bd362[Running, pool size = 6, active threads = 0, queued tasks = 0, completed tasks = 6], statistics {761530 bytes read, 320081 bytes written, 712 read ops, 0 large read ops, 31 write ops}, metrics {{Context=S3AFileSystem} {FileSystemId=db54a51b-e05e-4b3c-9140-240762a0c03d-redact} {fsURI=s3a://redact/redact/sparkevents} {files_created=5} {files_copied=0} {files_copied_bytes=0} {files_deleted=271} {fake_directories_deleted=0} {directories_created=6} {directories_deleted=0} {ignored_errors=4} {op_copy_from_local_file=0} {op_exists=53} {op_get_file_status=415} {op_glob_status=0} {op_is_directory=38} {op_is_file=0} {op_list_files=271} {op_list_located_status=0} {op_list_status=19} {op_mkdirs=5} {op_rename=0} {object_copy_requests=0} {object_delete_requests=5} {object_list_requests=680} {object_continue_list_requests=0} {object_metadata_requests=805} {object_multipart_aborted=0} {object_put_bytes=320081} {object_put_requests=10} {object_put_requests_completed=10} {stream_write_failures=0} {stream_write_block_uploads=0} {stream_write_block_uploads_committed=0} {stream_write_block_uploads_aborted=0} {stream_write_total_time=0} {stream_write_total_data=320081} {object_put_requests_active=0} {object_put_bytes_pending=0} {stream_write_block_uploads_active=0} {stream_write_block_uploads_pending=4} {stream_write_block_uploads_data_pending=0} {stream_read_fully_operations=0} {stream_opened=22} {stream_bytes_skipped_on_seek=0} {stream_closed=22} {stream_bytes_backwards_on_seek=437965} {stream_bytes_read=761530} {stream_read_operations_incomplete=107} {stream_bytes_discarded_in_abort=0} {stream_close_operations=22} {stream_read_operations=3020} {stream_aborted=0} {stream_forward_seek_operations=0} {stream_backward_seek_operations=1} {stream_seek_operations=1} {stream_bytes_read_in_close=8} {stream_read_exceptions=0} }}
2020-08-12 11:31:34,438 [main] INFO org.apache.hudi.hive.HiveSyncTool - Storage partitions scan complete. Found 271
2020-08-12 11:31:34,476 [main] INFO org.apache.hudi.hive.HiveSyncTool - New Partitions [AAB, redactlist]
2020-08-12 11:31:34,476 [main] INFO org.apache.hudi.hive.HoodieHiveClient - Adding partitions 271 to table dmstest_multpk4
2020-08-12 11:31:34,477 [main] ERROR org.apache.hudi.hive.HiveSyncTool - Got runtime exception when hive syncing
org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync partitions for table dmstest_multpk4
at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:187)
at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:126)
at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:87)
at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncHive(DeltaSync.java:460)
at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:402)
at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:235)
at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:123)
at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:380)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values [AAB]. Check partition strategy.
at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40)
at org.apache.hudi.hive.HoodieHiveClient.getPartitionClause(HoodieHiveClient.java:182)
at org.apache.hudi.hive.HoodieHiveClient.constructAddPartitions(HoodieHiveClient.java:166)
at org.apache.hudi.hive.HoodieHiveClient.addPartitionsToTable(HoodieHiveClient.java:141)
at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:182)
... 19 more
2020-08-12 11:31:34,513 [main] INFO org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer - Shut down deltastreamer
2020-08-12 11:31:34,535 [main] INFO org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend - Shutting down all executors
aws s3 ls s3://redact/my2/multpk4/
PRE .hoodie/
PRE AAB/
PRE CC/
PRE DD/
...etc
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (6 by maintainers)
Top Results From Across the Web
[GitHub] [hudi] satishkotha commented on issue #1954 ...
... commented on issue #1954: [SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values.
Read more >Specified partition columns do not match the ... - Stack Overflow
I was getting the same error while writing a dataframe to a delta table. Turns out, I had earlier created the table without...
Read more >[#HUDI-628] MultiPartKeysValueExtractor does not work with ...
IllegalArgumentException : Partition key parts [partitionpath] does not match with partition values [americas, brazil, sao_paulo].
Read more >2 ADF_FACES-00001 to ADF_FACES-60151
Cause : The client attempted to write to a component property that is only writable by the server. Action: Contact Oracle Support Services....
Read more >SQL Syntax Reference
No part of this document may be reproduced or transmitted in any form ... 1.9.6 Updating Partitioned Table Data (Only OBS Tables Supported)....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
If a single column as key works for you, you can also try
hoodie.datasource.write.keygenerator.class=com.uber.hoodie.NonpartitionedKeyGenerator hoodie.datasource.hive_sync.partition_extractor_class=com.uber.hoodie.hive.NonPartitionedExtractor hoodie.datasource.write.recordkey.field=(new column that is unique)
Closing this issue.