question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Issue with Kafka Reader

See original GitHub issue

Hello, I have an issue with kafka, I ran a command which was reading the dataset using FileReader, and it was OK, then I tried to do the same using ‘KafkaReader’, but it fails. the log results are shown below: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 18/05/26 07:01:30 INFO SparkContext: Running Spark version 2.1.0 18/05/26 07:01:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/05/26 07:01:31 WARN Utils: Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 192.168.17.177 instead (on interface eno33557248) 18/05/26 07:01:31 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 18/05/26 07:01:31 INFO SecurityManager: Changing view acls to: root 18/05/26 07:01:31 INFO SecurityManager: Changing modify acls to: root 18/05/26 07:01:31 INFO SecurityManager: Changing view acls groups to: 18/05/26 07:01:31 INFO SecurityManager: Changing modify acls groups to: 18/05/26 07:01:31 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() 18/05/26 07:01:31 INFO Utils: Successfully started service 'sparkDriver' on port 44017. 18/05/26 07:01:31 INFO SparkEnv: Registering MapOutputTracker 18/05/26 07:01:31 INFO SparkEnv: Registering BlockManagerMaster 18/05/26 07:01:31 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 18/05/26 07:01:31 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 18/05/26 07:01:31 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-0a7ed5a0-688d-4684-8962-9f9c398dc979 18/05/26 07:01:31 INFO MemoryStore: MemoryStore started with capacity 366.3 MB 18/05/26 07:01:31 INFO SparkEnv: Registering OutputCommitCoordinator 18/05/26 07:01:32 INFO Utils: Successfully started service 'SparkUI' on port 4040. 18/05/26 07:01:32 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.17.177:4040 18/05/26 07:01:32 INFO SparkContext: Added JAR file:/root/Downloads/spark-streaming-kafka-0-10_2.10-2.1.0.jar at spark://192.168.17.177:44017/jars/spark-streaming-kafka-0-10_2.10-2.1.0.jar with timestamp 1527332492071 18/05/26 07:01:32 INFO SparkContext: Added JAR file:/root/streamDM/scripts/../target/scala-2.10/streamdm-spark-streaming-_2.10-0.2.jar at spark://192.168.17.177:44017/jars/streamdm-spark-streaming-_2.10-0.2.jar with timestamp 1527332492072 18/05/26 07:01:32 INFO Executor: Starting executor ID driver on host localhost 18/05/26 07:01:32 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 45190. 18/05/26 07:01:32 INFO NettyBlockTransferService: Server created on 192.168.17.177:45190 18/05/26 07:01:32 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 18/05/26 07:01:32 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.17.177, 45190, None) 18/05/26 07:01:32 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.17.177:45190 with 366.3 MB RAM, BlockManagerId(driver, 192.168.17.177, 45190, None) 18/05/26 07:01:32 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.17.177, 45190, None) 18/05/26 07:01:32 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.17.177, 45190, None) Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/streaming/kafka/KafkaUtils$ at org.apache.spark.streamdm.streams.KafkaReader.getExamples(KafkaReader.scala:62) at org.apache.spark.streamdm.tasks.EvaluatePrequential.run(EvaluatePrequential.scala:71) at org.apache.spark.streamdm.streamDMJob$.main(streamDMJob.scala:56) at org.apache.spark.streamdm.streamDMJob.main(streamDMJob.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaUtils$ at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 13 more 18/05/26 07:01:32 INFO SparkContext: Invoking stop() from shutdown hook 18/05/26 07:01:32 INFO SparkUI: Stopped Spark web UI at http://192.168.17.177:4040 18/05/26 07:01:32 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 18/05/26 07:01:32 INFO MemoryStore: MemoryStore cleared 18/05/26 07:01:32 INFO BlockManager: BlockManager stopped 18/05/26 07:01:32 INFO BlockManagerMaster: BlockManagerMaster stopped 18/05/26 07:01:32 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 18/05/26 07:01:32 INFO SparkContext: Successfully stopped SparkContext 18/05/26 07:01:32 INFO ShutdownHookManager: Shutdown hook called 18/05/26 07:01:32 INFO ShutdownHookManager: Deleting directory /tmp/spark-23e4273b-e33d-4dbf-ace4-72597addd9b6

Infrastructure details

  • **Java Version: 1.8.0_171
  • **Scala Version:2.10
  • **Spark version:2.1.0
  • **OS version: CentOS7
  • **Cluster mode or local mode? local
  • **Kafka Version: 0.10.2

also I imported org.apache.spark:spark-streaming-kafka-0-10_2.10:2.1.0 in my terminal using --packages and in my ~/.bashrc using an export. I didn’t work and I changed my spark.sh, in scripts directory to: $SPARK_HOME/bin/spark-submit \ --jars /root/Downloads/spark-streaming-kafka-0-10_2.10-2.1.0.jar \ --class "org.apache.spark.streamdm.streamDMJob" \ --master local[2] \ ../target/scala-2.10/streamdm-spark-streaming-_2.10-0.2.jar \ $1 but the same error appears everytime. can any one help me on this?

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:11

github_iconTop GitHub Comments

1reaction
saraAlizadehcommented, Jul 31, 2018

Hi, Sorry for delay. I solved this issue. it was kinda lack of coding instead of bad coding! I used logInfo to print out some values. I found out these facts:

  1. all labels are read as ‘0’
  2. all predicted labels are ‘0’
  3. number of classes is the default value, not the value related to dataset.
  4. number of features is ‘0’
  5. none of readers work properly. except FileReader. and also it just reads .arff files fine, not csv files.
  6. I came to this: .arff files have a description. a function, fromArff, reads it and then makes ‘example specification’. which is the input of other important methods like learning and prediction. So except when your file is .arff, nothing will go right.

So this is what i did for kafka: I made a .arff file with only description part of my dataset and no data. I called fromArff in my version ofkafkaReader to make exapmleSpec. after that I had a correct spec of my data I started kafka and read lines of dataset.

there were plenty of details but I think the whole story is clear now. Hope this would be helpful for anyone and thank you @hmgomes to guide me. You were really helpful.

0reactions
hmgomescommented, Aug 2, 2018

Thanks @saraAlizadeh for your detailed analysis and work around this problem. As soon as I finish what I am working on, I will update those readers and test then individually. Once again, thanks for taking the time to look into this and I hope in the future it becomes easier to use kafka reader

Best wishes,

Heitor

Read more comments on GitHub >

github_iconTop Results From Across the Web

5 Common Pitfalls When Using Apache Kafka - Confluent
5 Common Pitfalls When Using Apache Kafka · 1. Setting request.timeout.ms too low · 2. Misunderstanding producer retries and retriable exceptions.
Read more >
Chapter 4. Kafka Consumers: Reading Data from Kafka
Kafka consumers are typically part of a consumer group . When multiple consumers are subscribed to a topic and belong to the same...
Read more >
the kafka reader got an unknown error reading partition #726
First issue we're often getting a reader error ~500k/day. the kafka reader got an unknown error reading partition 9 of SOME_TOPIC at offset ......
Read more >
7 mistakes when using Apache Kafka | by Michał Matłoka
Sending a message to non-existing Kafka topic, by default results in its creation. Unfortunately the default settings define a single ...
Read more >
Kafka Reader - Striim
Reads data from Apache Kafka 0.8, 0.9, 0.10, 0.11 or 2.1. See Supported reader-parser combinations) for parsing options.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found