Issue with Kafka Reader
See original GitHub issueHello,
I have an issue with kafka, I ran a command which was reading the dataset using FileReader
, and it was OK,
then I tried to do the same using ‘KafkaReader’, but it fails. the log results are shown below:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 18/05/26 07:01:30 INFO SparkContext: Running Spark version 2.1.0 18/05/26 07:01:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/05/26 07:01:31 WARN Utils: Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 192.168.17.177 instead (on interface eno33557248) 18/05/26 07:01:31 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 18/05/26 07:01:31 INFO SecurityManager: Changing view acls to: root 18/05/26 07:01:31 INFO SecurityManager: Changing modify acls to: root 18/05/26 07:01:31 INFO SecurityManager: Changing view acls groups to: 18/05/26 07:01:31 INFO SecurityManager: Changing modify acls groups to: 18/05/26 07:01:31 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() 18/05/26 07:01:31 INFO Utils: Successfully started service 'sparkDriver' on port 44017. 18/05/26 07:01:31 INFO SparkEnv: Registering MapOutputTracker 18/05/26 07:01:31 INFO SparkEnv: Registering BlockManagerMaster 18/05/26 07:01:31 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 18/05/26 07:01:31 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 18/05/26 07:01:31 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-0a7ed5a0-688d-4684-8962-9f9c398dc979 18/05/26 07:01:31 INFO MemoryStore: MemoryStore started with capacity 366.3 MB 18/05/26 07:01:31 INFO SparkEnv: Registering OutputCommitCoordinator 18/05/26 07:01:32 INFO Utils: Successfully started service 'SparkUI' on port 4040. 18/05/26 07:01:32 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.17.177:4040 18/05/26 07:01:32 INFO SparkContext: Added JAR file:/root/Downloads/spark-streaming-kafka-0-10_2.10-2.1.0.jar at spark://192.168.17.177:44017/jars/spark-streaming-kafka-0-10_2.10-2.1.0.jar with timestamp 1527332492071 18/05/26 07:01:32 INFO SparkContext: Added JAR file:/root/streamDM/scripts/../target/scala-2.10/streamdm-spark-streaming-_2.10-0.2.jar at spark://192.168.17.177:44017/jars/streamdm-spark-streaming-_2.10-0.2.jar with timestamp 1527332492072 18/05/26 07:01:32 INFO Executor: Starting executor ID driver on host localhost 18/05/26 07:01:32 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 45190. 18/05/26 07:01:32 INFO NettyBlockTransferService: Server created on 192.168.17.177:45190 18/05/26 07:01:32 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 18/05/26 07:01:32 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.17.177, 45190, None) 18/05/26 07:01:32 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.17.177:45190 with 366.3 MB RAM, BlockManagerId(driver, 192.168.17.177, 45190, None) 18/05/26 07:01:32 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.17.177, 45190, None) 18/05/26 07:01:32 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.17.177, 45190, None) Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/streaming/kafka/KafkaUtils$ at org.apache.spark.streamdm.streams.KafkaReader.getExamples(KafkaReader.scala:62) at org.apache.spark.streamdm.tasks.EvaluatePrequential.run(EvaluatePrequential.scala:71) at org.apache.spark.streamdm.streamDMJob$.main(streamDMJob.scala:56) at org.apache.spark.streamdm.streamDMJob.main(streamDMJob.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaUtils$ at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 13 more 18/05/26 07:01:32 INFO SparkContext: Invoking stop() from shutdown hook 18/05/26 07:01:32 INFO SparkUI: Stopped Spark web UI at http://192.168.17.177:4040 18/05/26 07:01:32 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 18/05/26 07:01:32 INFO MemoryStore: MemoryStore cleared 18/05/26 07:01:32 INFO BlockManager: BlockManager stopped 18/05/26 07:01:32 INFO BlockManagerMaster: BlockManagerMaster stopped 18/05/26 07:01:32 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 18/05/26 07:01:32 INFO SparkContext: Successfully stopped SparkContext 18/05/26 07:01:32 INFO ShutdownHookManager: Shutdown hook called 18/05/26 07:01:32 INFO ShutdownHookManager: Deleting directory /tmp/spark-23e4273b-e33d-4dbf-ace4-72597addd9b6
Infrastructure details
- **Java Version: 1.8.0_171
- **Scala Version:2.10
- **Spark version:2.1.0
- **OS version: CentOS7
- **Cluster mode or local mode? local
- **Kafka Version: 0.10.2
also I imported org.apache.spark:spark-streaming-kafka-0-10_2.10:2.1.0
in my terminal using --packages
and in my ~/.bashrc
using an export.
I didn’t work and I changed my spark.sh, in scripts directory to:
$SPARK_HOME/bin/spark-submit \ --jars /root/Downloads/spark-streaming-kafka-0-10_2.10-2.1.0.jar \ --class "org.apache.spark.streamdm.streamDMJob" \ --master local[2] \ ../target/scala-2.10/streamdm-spark-streaming-_2.10-0.2.jar \ $1
but the same error appears everytime.
can any one help me on this?
Issue Analytics
- State:
- Created 5 years ago
- Comments:11
Top GitHub Comments
Hi, Sorry for delay. I solved this issue. it was kinda lack of coding instead of bad coding! I used logInfo to print out some values. I found out these facts:
FileReader
. and also it just reads .arff files fine, not csv files.fromArff
, reads it and then makes ‘example specification’. which is the input of other important methods like learning and prediction. So except when your file is .arff, nothing will go right.So this is what i did for kafka: I made a .arff file with only description part of my dataset and no data. I called
fromArff
in my version ofkafkaReader
to make exapmleSpec. after that I had a correct spec of my data I started kafka and read lines of dataset.there were plenty of details but I think the whole story is clear now. Hope this would be helpful for anyone and thank you @hmgomes to guide me. You were really helpful.
Thanks @saraAlizadeh for your detailed analysis and work around this problem. As soon as I finish what I am working on, I will update those readers and test then individually. Once again, thanks for taking the time to look into this and I hope in the future it becomes easier to use kafka reader
Best wishes,
Heitor