Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Improve HiveSyncTool handling of empty commit timeline

See original GitHub issue

Here is my use case: I am using spark streaming to write data being received from Kafka to hoodie table, then I sync to hive table. And I using the non-partitioned hive table. I had set the KeyGenerator with NonpartitionedKeyGenerator.class. When I synced to hive, the following error occurred. What’s the reason causing this. Below is my error ：

2019-01-02 10:04:02,511 ERROR scheduler.JobScheduler (Logging.scala:logError(91)) - Error running job streaming job 1546394640000 ms.0
java.lang.IllegalArgumentException: Could not find any data file written for commit [20190102100400__commit__COMPLETED], could not get schema for dataset hdfs://nns-off/databus/hudi/tables/databus_realtime_databus_realtime_databus_sub_hd_t_hudi_sub_hd, Metadata :HoodieCommitMetadata{partitionToWriteStats={}, compacted=false, extraMetadataMap={ROLLING_STAT={
  "partitionToRollingStats" : {
    "" : {
      "9e163bc3-c14f-4c46-937a-67134b26f7e2" : {
        "fileId" : "9e163bc3-c14f-4c46-937a-67134b26f7e2",
        "inserts" : 1,
        "upserts" : 0,
        "deletes" : 0,
        "totalInputWriteBytesToDisk" : 0,
        "totalInputWriteBytesOnDisk" : 434448
      },
      "0ba4d519-6e8d-42f6-b27e-f027b45f5b06" : {
        "fileId" : "0ba4d519-6e8d-42f6-b27e-f027b45f5b06",
        "inserts" : 1,
        "upserts" : 0,
        "deletes" : 0,
        "totalInputWriteBytesToDisk" : 0,
        "totalInputWriteBytesOnDisk" : 434436
      },
      "a4d2eaf3-6027-4093-9621-b40cc2ebcb8b" : {
        "fileId" : "a4d2eaf3-6027-4093-9621-b40cc2ebcb8b",
        "inserts" : 1,
        "upserts" : 0,
        "deletes" : 0,
        "totalInputWriteBytesToDisk" : 0,
        "totalInputWriteBytesOnDisk" : 434436
      },
      "0571876c-78a6-4263-8976-c687ad2e0ba9" : {
        "fileId" : "0571876c-78a6-4263-8976-c687ad2e0ba9",
        "inserts" : 1,
        "upserts" : 0,
        "deletes" : 0,
        "totalInputWriteBytesToDisk" : 0,
        "totalInputWriteBytesOnDisk" : 434435
      },
      "321509ba-814e-4d9d-a135-dfe958490ec8" : {
        "fileId" : "321509ba-814e-4d9d-a135-dfe958490ec8",
        "inserts" : 1,
        "upserts" : 0,
        "deletes" : 0,
        "totalInputWriteBytesToDisk" : 0,
        "totalInputWriteBytesOnDisk" : 434435
      },
      "8c36a009-0042-40f5-abb3-91f891cec775" : {
        "fileId" : "8c36a009-0042-40f5-abb3-91f891cec775",
        "inserts" : 1,
        "upserts" : 0,
        "deletes" : 0,
        "totalInputWriteBytesToDisk" : 0,
        "totalInputWriteBytesOnDisk" : 434450
      },
      "15adf256-c64d-4dd9-9d45-4d911b5763ba" : {
        "fileId" : "15adf256-c64d-4dd9-9d45-4d911b5763ba",
        "inserts" : 1,
        "upserts" : 0,
        "deletes" : 0,
        "totalInputWriteBytesToDisk" : 0,
        "totalInputWriteBytesOnDisk" : 434435
      },
      "17ef27e0-dedc-4b8b-9997-da7bc65a61d1" : {
        "fileId" : "17ef27e0-dedc-4b8b-9997-da7bc65a61d1",
        "inserts" : 1,
        "upserts" : 0,
        "deletes" : 0,
        "totalInputWriteBytesToDisk" : 0,
        "totalInputWriteBytesOnDisk" : 434436
      },
      "4139fed7-ce24-4e16-ba2f-9dd30b5a7100" : {
        "fileId" : "4139fed7-ce24-4e16-ba2f-9dd30b5a7100",
        "inserts" : 1,
        "upserts" : 0,
        "deletes" : 0,
        "totalInputWriteBytesToDisk" : 0,
        "totalInputWriteBytesOnDisk" : 434436
      },
      "2284b3dc-4137-44b0-a71f-de5fbd14a7ff" : {
        "fileId" : "2284b3dc-4137-44b0-a71f-de5fbd14a7ff",
        "inserts" : 1,
        "upserts" : 0,
        "deletes" : 0,
        "totalInputWriteBytesToDisk" : 0,
        "totalInputWriteBytesOnDisk" : 434436
      },
      "d10310dd-7b4a-4669-9523-e7e9ac01ff17" : {
        "fileId" : "d10310dd-7b4a-4669-9523-e7e9ac01ff17",
        "inserts" : 1,
        "upserts" : 0,
        "deletes" : 0,
        "totalInputWriteBytesToDisk" : 0,
        "totalInputWriteBytesOnDisk" : 434436
      },
      "68592d6b-82f7-4ea6-8cce-8b21a07d4b1a" : {
        "fileId" : "68592d6b-82f7-4ea6-8cce-8b21a07d4b1a",
        "inserts" : 1,
        "upserts" : 0,
        "deletes" : 0,
        "totalInputWriteBytesToDisk" : 0,
        "totalInputWriteBytesOnDisk" : 434441
      }
    }
  },
  "actionType" : "commit"
}}}
	at com.uber.hoodie.hive.HoodieHiveClient.lambda$getDataSchema$1(HoodieHiveClient.java:317)
	at java.util.Optional.orElseThrow(Optional.java:290)
	at com.uber.hoodie.hive.HoodieHiveClient.getDataSchema(HoodieHiveClient.java:315)
	at com.uber.hoodie.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:94)
	at com.uber.hoodie.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:68)
	at com.lianjia.dtarch.databus.streaming.hudi.service.HudiService.syncToHive(HudiService.java:79)
	at com.lianjia.dtarch.databus.streaming.hudi.service.HudiService.writeWithCompactAndSync(HudiService.java:58)
	at com.lianjia.dtarch.databus.streaming.hudi.KfkHudiConsumer.lambda$saveToHudi$c06d719c$1(KfkHudiConsumer.java:161)
	at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$1.apply(JavaDStreamLike.scala:272)
	at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$1.apply(JavaDStreamLike.scala:272)
	at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:628)
	at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:628)
	at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)
	at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
	at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
	at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416)
	at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)
	at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
	at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
	at scala.util.Try$.apply(Try.scala:192)
	at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
	at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:257)
	at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)
	at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
	at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:256)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:748)

Issue Analytics

State:
Created 5 years ago
Comments:10 (6 by maintainers)

Top GitHub Comments

1reaction

n3nashcommented, Mar 5, 2019

@NetsanetGeb Hudi is registered as an external table in Hive. Hudi controls the mechanism of how writes are done to a table (writing to an HDFS location) as well as manages schema evolution via avro and eventually registers a custom InputFormat to allow for snapshot reading of these tables. Thus, the data is controlled and managed by Hudi while the metadata (such as partitions) is managed by Hive. There shouldn’t be any difference in the hive meta store for Hudi tables but there are a few general differences between Managed vs External tables such as how drop partitions/tables work. Some details here : https://cwiki.apache.org/confluence/display/Hive/Managed+vs.+External+Tables

0reactions

vinothchandarcommented, Mar 20, 2019

@NetsanetGeb Just seems like the job cannot talk to kafka? btw, do you mind posting this on the mailing list, since this seems like a separate issue? https://hudi.apache.org/community.html We are using the mailing list as the primary support channel now & can respond much quicker there