Not able to iterate messages and java.util.concurrent.TimeoutException: Futures timed out after [300 seconds]
See original GitHub issueNot able to iterate the received message from EventHub. Below is the code sample.
val connString25Partition = ConnectionStringBuilder()
.setNamespaceName("namespace")
.setEventHubName("name")
.setSasKeyName("policy")
.setSasKey("key")
.build
val ehConf = EventHubsConf(connString25Partition)
.setConsumerGroup("$Default")
.setMaxEventsPerTrigger(5)
.setStartingPosition(org.apache.spark.eventhubs.EventPosition.fromStartOfStream)
val eventHubsStream = EventHubsUtils.createDirectStream(streamingContext,ehConf)
case class EventContent(EventDetails: String)
eventHubsStream.map(x => EventContent(new String(x.getBytes)))
.foreachRDD{ rdd =>
logGeneral.info(f"rdd count = ${rdd.count()}")
logGeneral.info(f" : rdd partition = ${rdd.getNumPartitions}")
logGeneral.info(f" rdd vals ${rdd.take(5).foreach(println)}")
rdd.foreach { message =>{
logGeneral.info(s"inside foreach ==============")
printMessage(message.EventDetails,"1")
}
}
}
Able to print rdd count and partition count. When we apply any action or tried to iterate rdd not workig. We suspect there is no bound to EventData for this version.
Let us know how we can iterate and print/excess the received message(EventData)
dependencies : <properties> <encoding>UTF-8</encoding> <scala.version>2.11.8</scala.version> <scala.compat.version>2.11</scala.compat.version> <spark.version>2.2.0</spark.version> </properties> <dependency> <groupId>com.microsoft.azure</groupId> <artifactId>azure-eventhubs-spark_2.11</artifactId> <version>2.3.6</version> </dependency>
**Main issue : getting time out exception after few minutes : **
18/11/23 08:37:03 WARN TaskSetManager: Lost task 3.3 in stage 2.0 (TID 32, wn528-cpedev.uvjbok3wwvduho2dvkd4zdcj1e.bx.internal.cloudapp.net, executor 6): java.util.concurrent.TimeoutException: Futures timed out after [300 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:190) at org.apache.spark.eventhubs.client.CachedEventHubsReceiver.checkCursor(CachedEventHubsReceiver.scala:96) at org.apache.spark.eventhubs.client.CachedEventHubsReceiver.org$apache$spark$eventhubs$client$CachedEventHubsReceiver$$receive(CachedEventHubsReceiver.scala:130) at org.apache.spark.eventhubs.client.CachedEventHubsReceiver$.receive(CachedEventHubsReceiver.scala:179) at org.apache.spark.eventhubs.rdd.EventHubsRDD.compute(EventHubsRDD.scala:122) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (1 by maintainers)
Top GitHub Comments
I got this exception as well. The comment of @FurcyPin did me realise that I had two sinks and two checkpoints from the same streaming dataframe. I tackled this by using
.foreachBatch { (batchDF: DataFrame, batchId: Long)
and cache batchDF before writing batchDF to two sinks. I guess the cache (usingpersist()
is essential to solve this issue. Somehow eventhub throws this Futures time-out when Spark evaluates the streaming dataframe twice.Foreachbatch should look like this:
I got the same error using library 2.2.6 with Spark 2.2 The messages belongs to the requested sequence number range are guaranteed being present in Event Hubs, so why the receiver cannot receive any message for 300 seconds?