question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

hdfsBuilderConnect class not found when loading the datasets into HDFS

See original GitHub issue

Environment:

  • Python version [e.g. 2.7, 3.6] 3.6
  • Spark version [e.g. 2.1, 2.3.1] 2.4.4
  • TensorFlow version [e.g. 1.5, 1.9.0] 1.14
  • TensorFlowOnSpark version [e.g. 1.1, 1.3.2] master
  • Cluster version [e.g. Standalone, Hadoop 2.8, CDH5] Hadoop 2.8.5

I am running the hadoop/spark installation on AWS EMR at the moment.

Describe the bug:

I am trying to run mnist example and I having an issue when performing the data prep, using the tensorflow_datasets package. In my code, mnist_data_setup.py loads the data to HDFS as opposed to local file system as seen below,

import tensorflow_datasets as tfds
mnist, info = tfds.load('mnist', with_info=True, data_dir='hdfs://default/user/hadoop/tensorflow_datas')

Perhaps the exception (shown below) is not pertaining to TensorflowOnSpark directly, but I wanted to see @leewyang can provide some advise/assistance here. Appreciate your time.

Logs:

I am receiving the following when running the spark application.

loadFileSystems error:
(unable to get stack trace for java.lang.NoClassDefFoundError exception: ExceptionUtils::getStackTrace error.)
hdfsBuilderConnect(forceNewInstance=0, nn=default, port=0, kerbTicketCachePath=(NULL), userName=(NULL)) error:

Spark Submit Command Line:

I have tried various variations, including providing LD_LIBRARY_PATH to the executor env.

${SPARK_HOME}/bin/spark-submit  --deploy-mode cluster \
--queue default --num-executors 4 \
--conf spark.executorEnv.CLASSPATH=$(hadoop classpath --glob) \
--executor-memory 4G --archives mnist/mnist.zip#mnist \
--jars hdfs:///user/${USER}/tensorflow-hadoop-1.10.0.jar,hdfs:///user/${USER}//spark-tensorflow-connector_2.11-1.10.0.jar \
TensorFlowOnSpark/examples/mnist/mnist_data_setup.py \
--output cluster --format tfr

I have performed the hadoop classpath --glob and verified that the full list of jars are present on both master and slave nodes.

Weird part is that when running the same python snippet on pyspark shell (after setting up CLASSPATH), it seems run perfectly fine.

import tensorflow_datasets as tfds
mnist, info = tfds.load('mnist', with_info=True, data_dir='hdfs://default/user/hadoop/tensorflow_datas')

Is there a known limitation around the length that can be passed via Spark Submit?

Additionally see a related issue here.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
leewyangcommented, Jan 17, 2020

You can try setting the CLASSPATH variable at the top of your map_fn with code like this:

import os
import subprocess

classpath = os.environ['CLASSPATH']
hadoop_path = os.path.join(os.environ['HADOOP_PREFIX'], 'bin', 'hadoop')
hadoop_classpath = subprocess.check_output([hadoop_path, 'classpath', '--glob']).decode()
os.environ['CLASSPATH']=classpath + os.pathsep + hadoop_classpath
0reactions
jerrygbcommented, Jan 20, 2020

Hi @leewyang,

Thank you, you have solved the issue! If I see another way to retain the classpath across the executors according what we pass onto the spark-submit, then I will post back here.

I will close this issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Unable to read HDFS using Java program: Could not find or ...
The error you are getting is related to Hadoop being not able to find the class URLCat in its classpath. You can either...
Read more >
RDD Programming Guide - Spark 3.3.1 Documentation
The first line defines a base RDD from an external file. This dataset is not loaded in memory or otherwise acted on: lines...
Read more >
Loading data into HDFS
The aim of this short guide is to provide detailed instructions of how to load a dataset from a. PC into a Hadoop...
Read more >
Receiving “java.lang.NoClassDefFoundError” — Dataiku DSS ...
NoClassDefFoundError or class not initialized error message in the UI. Here are some common reasons and resolutions. Receiving “java.lang.NoClassDefFoundError: ...
Read more >
Hadoop - copyFromLocal Command - GeeksforGeeks
If the file is already present in the folder then copy it into the ... you can observe that copyFromLocal command itself does...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found