How to set up elephas on spark workers with --archives
See original GitHub issueWhen running the basic example from the doc:
from elephas.utils.rdd_utils import to_simple_rdd rdd = to_simple_rdd(sc, x_train, y_train) from elephas.spark_model import SparkModel from elephas import optimizers as elephas_optimizers sgd = elephas_optimizers.SGD() spark_model = SparkModel(sc, model, optimizer=sgd, frequency=‘epoch’, mode=‘asynchronous’, num_workers=2) spark_model.train(rdd, nb_epoch=epochs, batch_size=batch_size, verbose=1, validation_split=0.1)
I get the following error: “ImportError: No module named elephas.spark_model” I use PySpark 2.1 and Keras 2. Any suggestions?
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 5.0 failed 4 times, most recent failure: Lost task 1.3 in stage 5.0 (TID 58, xxxx, executor 8): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/xx/xx/hadoop/yarn/local/usercache/xx/appcache/application_1512662857247_19188/container_151xxx2857247_19188_01_000009/pyspark.zip/pyspark/worker.py", line 163, in main
func, profiler, deserializer, serializer = read_command(pickleSer, infile)
File "/xx/xx/hadoop/yarn/local/usercache/xx/appcache/application_1512662857247_19188/container_151xxx2857247_19188_01_000009/pyspark.zip/pyspark/worker.py", line 54, in read_command
command = serializer._read_with_length(file)
File /yarn/local/usercache/xx/appcache/application_1512xx57247_19x8/container_1512xxx857247_19188_01_000009/pyspark.zip/pyspark/serializers.py", line 169, in _read_with_length
return self.loads(obj)
File "/yarn//local/usercache/xx/appcache/application_1512xx57247_19x8/container_1512xxx857247_19188_01_000009/pyspark.zip/pyspark/serializers.py", line 454, in loads
return pickle.loads(obj)
ImportError: No module named elephas.spark_model
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)```
Issue Analytics
- State:
- Created 6 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
python - Elephas not loaded in PySpark: No module named ...
I found a solution on how to properly load a virtual environment to the master and all the slave workers: virtualenv venv --relocatable...
Read more >Distributed Deep Learning with Elephas
This post presents the python code to run Keras model in a distributed environment powered by Apache Spark.
Read more >Distributed Deep Learning Pipelines with PySpark and Keras
The first thing we do with Elephas is create an estimator similar to some of the PySpark pipeline items above. We can set...
Read more >Spark ML model pipelines on Distributed Deep Neural Nets
If you don't have it already, install Spark locally by following the instructions provided ... --driver-memory 4G elephas/examples/Spark_ML_Pipeline.ipynb.
Read more >Deep Learning With Apache Spark: Part 1 - KDnuggets
This part: What is Spark, basics on Spark+DL and a little more. ... Deep Learning Pipelines is an open source library created by...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
yeah, production systems are always a little messy. in the end I can only guess what’s going on. keep me posted in case I can help somehow.
cool, thanks for your feedback. I’ve changed the name of the issue so people can find this. at some point I want to write up how to use elephas from scratch on AWS or GCE etc., this might be very helpful.