question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Improve HiveSyncTool handling of empty commit timeline

See original GitHub issue

Here is my use case: I am using spark streaming to write data being received from Kafka to hoodie table, then I sync to hive table. And I using the non-partitioned hive table. I had set the KeyGenerator with NonpartitionedKeyGenerator.class. When I synced to hive, the following error occurred. What’s the reason causing this. Below is my error :

2019-01-02 10:04:02,511 ERROR scheduler.JobScheduler (Logging.scala:logError(91)) - Error running job streaming job 1546394640000 ms.0
java.lang.IllegalArgumentException: Could not find any data file written for commit [20190102100400__commit__COMPLETED], could not get schema for dataset hdfs://nns-off/databus/hudi/tables/databus_realtime_databus_realtime_databus_sub_hd_t_hudi_sub_hd, Metadata :HoodieCommitMetadata{partitionToWriteStats={}, compacted=false, extraMetadataMap={ROLLING_STAT={
  "partitionToRollingStats" : {
    "" : {
      "9e163bc3-c14f-4c46-937a-67134b26f7e2" : {
        "fileId" : "9e163bc3-c14f-4c46-937a-67134b26f7e2",
        "inserts" : 1,
        "upserts" : 0,
        "deletes" : 0,
        "totalInputWriteBytesToDisk" : 0,
        "totalInputWriteBytesOnDisk" : 434448
      },
      "0ba4d519-6e8d-42f6-b27e-f027b45f5b06" : {
        "fileId" : "0ba4d519-6e8d-42f6-b27e-f027b45f5b06",
        "inserts" : 1,
        "upserts" : 0,
        "deletes" : 0,
        "totalInputWriteBytesToDisk" : 0,
        "totalInputWriteBytesOnDisk" : 434436
      },
      "a4d2eaf3-6027-4093-9621-b40cc2ebcb8b" : {
        "fileId" : "a4d2eaf3-6027-4093-9621-b40cc2ebcb8b",
        "inserts" : 1,
        "upserts" : 0,
        "deletes" : 0,
        "totalInputWriteBytesToDisk" : 0,
        "totalInputWriteBytesOnDisk" : 434436
      },
      "0571876c-78a6-4263-8976-c687ad2e0ba9" : {
        "fileId" : "0571876c-78a6-4263-8976-c687ad2e0ba9",
        "inserts" : 1,
        "upserts" : 0,
        "deletes" : 0,
        "totalInputWriteBytesToDisk" : 0,
        "totalInputWriteBytesOnDisk" : 434435
      },
      "321509ba-814e-4d9d-a135-dfe958490ec8" : {
        "fileId" : "321509ba-814e-4d9d-a135-dfe958490ec8",
        "inserts" : 1,
        "upserts" : 0,
        "deletes" : 0,
        "totalInputWriteBytesToDisk" : 0,
        "totalInputWriteBytesOnDisk" : 434435
      },
      "8c36a009-0042-40f5-abb3-91f891cec775" : {
        "fileId" : "8c36a009-0042-40f5-abb3-91f891cec775",
        "inserts" : 1,
        "upserts" : 0,
        "deletes" : 0,
        "totalInputWriteBytesToDisk" : 0,
        "totalInputWriteBytesOnDisk" : 434450
      },
      "15adf256-c64d-4dd9-9d45-4d911b5763ba" : {
        "fileId" : "15adf256-c64d-4dd9-9d45-4d911b5763ba",
        "inserts" : 1,
        "upserts" : 0,
        "deletes" : 0,
        "totalInputWriteBytesToDisk" : 0,
        "totalInputWriteBytesOnDisk" : 434435
      },
      "17ef27e0-dedc-4b8b-9997-da7bc65a61d1" : {
        "fileId" : "17ef27e0-dedc-4b8b-9997-da7bc65a61d1",
        "inserts" : 1,
        "upserts" : 0,
        "deletes" : 0,
        "totalInputWriteBytesToDisk" : 0,
        "totalInputWriteBytesOnDisk" : 434436
      },
      "4139fed7-ce24-4e16-ba2f-9dd30b5a7100" : {
        "fileId" : "4139fed7-ce24-4e16-ba2f-9dd30b5a7100",
        "inserts" : 1,
        "upserts" : 0,
        "deletes" : 0,
        "totalInputWriteBytesToDisk" : 0,
        "totalInputWriteBytesOnDisk" : 434436
      },
      "2284b3dc-4137-44b0-a71f-de5fbd14a7ff" : {
        "fileId" : "2284b3dc-4137-44b0-a71f-de5fbd14a7ff",
        "inserts" : 1,
        "upserts" : 0,
        "deletes" : 0,
        "totalInputWriteBytesToDisk" : 0,
        "totalInputWriteBytesOnDisk" : 434436
      },
      "d10310dd-7b4a-4669-9523-e7e9ac01ff17" : {
        "fileId" : "d10310dd-7b4a-4669-9523-e7e9ac01ff17",
        "inserts" : 1,
        "upserts" : 0,
        "deletes" : 0,
        "totalInputWriteBytesToDisk" : 0,
        "totalInputWriteBytesOnDisk" : 434436
      },
      "68592d6b-82f7-4ea6-8cce-8b21a07d4b1a" : {
        "fileId" : "68592d6b-82f7-4ea6-8cce-8b21a07d4b1a",
        "inserts" : 1,
        "upserts" : 0,
        "deletes" : 0,
        "totalInputWriteBytesToDisk" : 0,
        "totalInputWriteBytesOnDisk" : 434441
      }
    }
  },
  "actionType" : "commit"
}}}
	at com.uber.hoodie.hive.HoodieHiveClient.lambda$getDataSchema$1(HoodieHiveClient.java:317)
	at java.util.Optional.orElseThrow(Optional.java:290)
	at com.uber.hoodie.hive.HoodieHiveClient.getDataSchema(HoodieHiveClient.java:315)
	at com.uber.hoodie.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94)
	at com.uber.hoodie.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:68)
	at com.lianjia.dtarch.databus.streaming.hudi.service.HudiService.syncToHive(HudiService.java:79)
	at com.lianjia.dtarch.databus.streaming.hudi.service.HudiService.writeWithCompactAndSync(HudiService.java:58)
	at com.lianjia.dtarch.databus.streaming.hudi.KfkHudiConsumer.lambda$saveToHudi$c06d719c$1(KfkHudiConsumer.java:161)
	at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$1.apply(JavaDStreamLike.scala:272)
	at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$1.apply(JavaDStreamLike.scala:272)
	at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:628)
	at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:628)
	at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)
	at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
	at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
	at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416)
	at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)
	at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
	at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
	at scala.util.Try$.apply(Try.scala:192)
	at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
	at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:257)
	at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)
	at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
	at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:256)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:748)

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:10 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
n3nashcommented, Mar 5, 2019

@NetsanetGeb Hudi is registered as an external table in Hive. Hudi controls the mechanism of how writes are done to a table (writing to an HDFS location) as well as manages schema evolution via avro and eventually registers a custom InputFormat to allow for snapshot reading of these tables. Thus, the data is controlled and managed by Hudi while the metadata (such as partitions) is managed by Hive. There shouldn’t be any difference in the hive meta store for Hudi tables but there are a few general differences between Managed vs External tables such as how drop partitions/tables work. Some details here : https://cwiki.apache.org/confluence/display/Hive/Managed+vs.+External+Tables

0reactions
vinothchandarcommented, Mar 20, 2019

@NetsanetGeb Just seems like the job cannot talk to kafka? btw, do you mind posting this on the mailing list, since this seems like a separate issue? https://hudi.apache.org/community.html We are using the mailing list as the primary support channel now & can respond much quicker there

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Push an Empty Commit in Git - freeCodeCamp
In this article, we will discuss how to push a commit in Git without making any changes. Git makes this process of pushing...
Read more >
All Configurations | Apache Hudi
This page covers the different ways of configuring your job to write/read Hudi tables. At a high level, you can control behaviour at...
Read more >
Rebasing a git history with empty commit messages
To replace empty commit messages with some template, you can do something like this: git filter-branch -f --msg-filter ' read msg if [...
Read more >
How (and why!) to keep your Git commit history clean - GitLab
Git commit history is very easy to mess up, here's how you can fix it! ... Note that empty commits are commented out....
Read more >
7.6 Git Tools - Rewriting History
To modify a commit that is farther back in your history, you must move to more complex tools. ... Note that empty commits...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found