TFSparkNode throws AttributeError on shutdown
See original GitHub issueEnvironment:
- Python version 3.6
- Spark version 2.3.1
- TensorFlow version 1.7.0
- TensorFlowOnSpark version 1.4.2
- Cluster version Standalone
Describe the bug: TFSparkNode throws AttributeError on shutdown
Logs:
2019-01-24 12:02:02,297 INFO (MainThread-13745) Feeding None into input queue
[2019-01-24 12:02:02.344] [ERROR] [Executor task launch worker for task 2] [org.apache.spark.executor.Executor] >>> [spark-] msg=Exception in task 0.0 in stage 1.0 (TID 2)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/home/mlp/.local/lib/python3.6/site-packages/tensorflowonspark/TFSparkNode.py", line 539, in _shutdown
AttributeError: 'AutoProxy[get_queue]' object has no attribute 'put'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/vipshop/platform/spark/python/lib/pyspark.zip/pyspark/worker.py", line 234, in main
process()
File "/home/vipshop/platform/spark/python/lib/pyspark.zip/pyspark/worker.py", line 229, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/home/vipshop/platform/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2457, in pipeline_func
File "/home/vipshop/platform/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2457, in pipeline_func
File "/home/vipshop/platform/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2457, in pipeline_func
File "/home/vipshop/platform/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 370, in func
File "/home/vipshop/platform/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 819, in func
File "/home/mlp/.local/lib/python3.6/site-packages/tensorflowonspark/TFSparkNode.py", line 542, in _shutdown
Exception: Queue 'input' not found on this node, check for exceptions on other nodes.
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:298)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:438)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:421)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:252)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at org.apache.spark.InterruptibleIterator.toArray(InterruptibleIterator.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:939)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:939)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Spark Submit Command Line:
spark-submit --py-files mnist_estimator.py mnist_estimator.py
Issue Analytics
- State:
- Created 5 years ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
AttributeError: 'NoneType' object has no attribute 'shutdown ...
I run cuckoo,but it stopped,display like this: ''' 2021-12-09 18:50:38,135 [cuckoo.core.scheduler] INFO: Using "virtualbox" as machine ...
Read more >AttributeError when shutting down tensorflow kernel in ...
I am a long time Mac user, but switched to Windows recently. I rebuilt the conda environment to run tensorflow in Windows. When...
Read more >Issues-yahoo/TensorFlowOnSpark - PythonTechWorld - Python 博客
TFSparkNode throws AttributeError on shutdown. 888. Environment: Python version 3.6 Spark version 2.3.1 TensorFlow version 1.7.0 TensorFlowOnSpark version ...
Read more >Error: 'DataLoaderIter' object has no attribute 'shutdown'
AttributeError : 'DataLoaderIter' object has no attribute 'shutdown'. smth February 24, 2017, 4:15pm #2. Without any context, we cannot reply ...
Read more >AttributeError: 'NoneType' object has no attribute 'add_timer'
... in _shutdown_clusters 6 cluster.shutdown() 7 File "/opt/graphite/pypy/site-packages/cassandra/cluster.py", line 1319, in shutdown 8 ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@manuzhang Basically, TFoS requires that you only schedule one task per executor, since we rely on the “mental model” of each Spark executor running only one TF node of the cluster. This makes it much simpler to understand, config, and debug. (Imagine if you had 20 TF node “tasks” running on one executor, where all those TF processes were writing to the executor’s single stderr log file). That said, you can still effectively achieve the “one TF node per executor” with higher cores by setting
spark.task.cpus
equal tospark.executor.cores
. So for simplicity, we just recommend 1 for both, but you can just as easily set them both to 8 or 16 or 40… as long as only one task runs on each executor in the cluster.@lasclocker Again, the Spark settings are mostly used for scheduling tasks onto executors, and I don’t believe that they’re enforced strictly. That said, MirroredStrategy is mostly a simplification of the GPU tower architecture, so it’s more useful for GPUs than CPU cores, and CollectiveAllReduceStrategy is more about distributed compute w/o a PS, so again, it’s less about CPU cores than network I/O.
Either way, if you really need to set cores > 1, just set
spark.task.cpus
equal tospark.executor.cores
in your job, and it should work fine…@leewyang , When TFoS uses Distribution Strategy, such as MirroredStrategy and CollectiveAllReduceStrategy, it requires greater than one core per executor, is that right ?