question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

1.4.4 exception: Timeout while feeding partition

See original GitHub issue

Hi, I am running the mnist example with

spark-submit \
--master yarn \
--py-files ~/software/TensorFlowOnSpark-1.4.4/examples/mnist/spark/mnist_dist.py \
--conf spark.cores.max=2 \
--conf spark.task.cpus=1 \
--conf spark.executorEnv.JAVA_HOME="$JAVA_HOME" \
~/software/TensorFlowOnSpark-1.4.4/examples/mnist/spark/mnist_spark.py \
--cluster_size 2 \
--images examples/mnist/csv/train/images \
--labels examples/mnist/csv/train/labels \
--format csv \
--mode train \
--model file:///home/pi/examples/mnist/mnist_model

If I save the output in the local, it work well.

When I try to save the output to the hdfs, it will stuck and show ‘exception: Timeout while feeding partition’. How to solve it ?🙏

spark-submit \
--master yarn \
--py-files ~/software/TensorFlowOnSpark-1.4.4/examples/mnist/spark/mnist_dist.py \
--conf spark.cores.max=2 \
--conf spark.task.cpus=1 \
--conf spark.executorEnv.JAVA_HOME="$JAVA_HOME" \
~/software/TensorFlowOnSpark-1.4.4/examples/mnist/spark/mnist_spark.py \
--cluster_size 2 \
--images examples/mnist/csv/train/images \
--labels examples/mnist/csv/train/labels \
--format csv \
--mode train \
--model mnist_model

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
MaQianhengcommented, Apr 8, 2020

I changed line 99 to logdir=None in mnist_dist.py. The problem was solved… 😂but, I don’t know why

0reactions
chansonzhangcommented, Apr 15, 2021

When this error occurred, I checked the container log and also find following error:

2021-04-15 16:22:33.646363: E tensorflow/core/platform/hadoop/hadoop_file_system.cc:115] HadoopFileSystem load error: libjvm.so: cannot open shared object file: No such file or directory

so I just add $LIB_JVM to --conf spark.executorEnv.LD_LIBRARY_PATH, and everything gets ok.

--conf spark.executorEnv.LD_LIBRARY_PATH=$LIB_JVM:$LIB_HDFS \
Read more comments on GitHub >

github_iconTop Results From Across the Web

Exception: Timeout while feeding partition - 15python - 博客园
Exception : Timeout while feeding partition. 复制代码. 21/12/19 16:38:54 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 1.0 (TID 6 ...
Read more >
FileNotFound Exception - launching a spark job using Yarn
Hi,. I would like to understand the behavior of SparkLauncherSparkShellProcess that uses Yarn. Using Kylo (dataLake), when the ...
Read more >
When spark streaming data from Kafka it shows connection ...
I want to stream data using spark from Kafka topic in Horton works. I have started the zookeeper and Kafka server. Then I...
Read more >
BIG-IP 14.1.4.4 Fixes and Known Issues - AskF5
797221-1, 2-Critical, BCM daemon can be killed by watchdog timeout during ... error caused by Drafts folder in a deleted custom partition while...
Read more >
Client changelog - UrBackup
Cache ntfs info correctly if it cannot be read from device (e.g. because of timeout). Fixes client connection error when querying capabilities. 1.4.4....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found