hdfsBuilderConnect class not found when loading the datasets into HDFS
See original GitHub issueEnvironment:
- Python version [e.g. 2.7, 3.6] 3.6
- Spark version [e.g. 2.1, 2.3.1] 2.4.4
- TensorFlow version [e.g. 1.5, 1.9.0] 1.14
- TensorFlowOnSpark version [e.g. 1.1, 1.3.2] master
- Cluster version [e.g. Standalone, Hadoop 2.8, CDH5] Hadoop 2.8.5
I am running the hadoop/spark installation on AWS EMR at the moment.
Describe the bug:
I am trying to run mnist example and I having an issue when performing the data prep, using the tensorflow_datasets package. In my code, mnist_data_setup.py
loads the data to HDFS as opposed to local file system as seen below,
import tensorflow_datasets as tfds
mnist, info = tfds.load('mnist', with_info=True, data_dir='hdfs://default/user/hadoop/tensorflow_datas')
Perhaps the exception (shown below) is not pertaining to TensorflowOnSpark directly, but I wanted to see @leewyang can provide some advise/assistance here. Appreciate your time.
Logs:
I am receiving the following when running the spark application.
loadFileSystems error:
(unable to get stack trace for java.lang.NoClassDefFoundError exception: ExceptionUtils::getStackTrace error.)
hdfsBuilderConnect(forceNewInstance=0, nn=default, port=0, kerbTicketCachePath=(NULL), userName=(NULL)) error:
Spark Submit Command Line:
I have tried various variations, including providing LD_LIBRARY_PATH
to the executor env.
${SPARK_HOME}/bin/spark-submit --deploy-mode cluster \
--queue default --num-executors 4 \
--conf spark.executorEnv.CLASSPATH=$(hadoop classpath --glob) \
--executor-memory 4G --archives mnist/mnist.zip#mnist \
--jars hdfs:///user/${USER}/tensorflow-hadoop-1.10.0.jar,hdfs:///user/${USER}//spark-tensorflow-connector_2.11-1.10.0.jar \
TensorFlowOnSpark/examples/mnist/mnist_data_setup.py \
--output cluster --format tfr
I have performed the hadoop classpath --glob
and verified that the full list of jars are present on both master and slave nodes.
Weird part is that when running the same python snippet on pyspark shell (after setting up CLASSPATH
), it seems run perfectly fine.
import tensorflow_datasets as tfds
mnist, info = tfds.load('mnist', with_info=True, data_dir='hdfs://default/user/hadoop/tensorflow_datas')
Is there a known limitation around the length that can be passed via Spark Submit?
Additionally see a related issue here.
Issue Analytics
- State:
- Created 4 years ago
- Comments:8 (3 by maintainers)
Top GitHub Comments
You can try setting the CLASSPATH variable at the top of your
map_fn
with code like this:Hi @leewyang,
Thank you, you have solved the issue! If I see another way to retain the classpath across the executors according what we pass onto the spark-submit, then I will post back here.
I will close this issue.