Failed to execute job because missing mist_worker.jar
See original GitHub issueContext
- Stack includes: 1
Spark Master
(withoutmist
) +Worker
(withoutmist
) +Mist Master
- Spark version
2.4.0
- Mist version
1.1.0
downtime="3600s"
max-conn-failures=5
max-parallel-jobs=1
precreated=false
run-options=""
spark-conf {
"spark.master"="spark://spark-master:7077"
"spark.submit.deployMode"="cluster"
"spark.dynamicAllocation.enabled"="true"
"spark.shuffle.service.enabled"="true"
}
streaming-duration="1s"
Log
mist_1 | 2018-11-09 10:59:42,857 INFO akka.event.slf4j.Slf4jLogger Slf4jLogger started
spark-master_1 | 2018-11-09 10:59:42,937 INFO org.apache.spark.deploy.master.Master Registering worker 29c1c06e51e3:9099 with 8 cores, 13.7 GB RAM
spark-worker_1 | 2018-11-09 10:59:42,966 INFO org.apache.spark.deploy.worker.Worker Successfully registered with master spark://spark-master:7077
mist_1 | 2018-11-09 10:59:43,184 INFO akka.remote.Remoting Starting remoting
mist_1 | 2018-11-09 10:59:43,412 INFO akka.remote.Remoting Remoting started; listening on addresses :[akka.tcp://mist@172.20.0.5:2551]
mist_1 | 2018-11-09 10:59:43,521 INFO org.flywaydb.core.internal.util.VersionPrinter Flyway 4.1.1 by Boxfuse
mist_1 | 2018-11-09 10:59:43,826 INFO org.flywaydb.core.internal.dbsupport.DbSupportFactory Database: jdbc:h2:file:/opt/mist/data/recovery.db (H2 1.4)
mist_1 | 2018-11-09 10:59:44,014 INFO org.flywaydb.core.internal.command.DbValidate Successfully validated 2 migrations (execution time 00:00.018s)
mist_1 | 2018-11-09 10:59:44,027 INFO org.flywaydb.core.internal.command.DbMigrate Current version of schema "PUBLIC": 2
mist_1 | 2018-11-09 10:59:44,027 INFO org.flywaydb.core.internal.command.DbMigrate Schema "PUBLIC" is up to date. No migration necessary.
mist_1 | 2018-11-09 10:59:44,540 INFO io.hydrosphere.mist.master.MasterServer$ LogsSystem started
mist_1 | 2018-11-09 10:59:46,042 WARN org.apache.hadoop.util.NativeCodeLoader Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
mist_1 | 2018-11-09 10:59:46,995 INFO akka.event.slf4j.Slf4jLogger Slf4jLogger started
mist_1 | 2018-11-09 10:59:47,264 INFO akka.remote.Remoting Starting remoting
mist_1 | 2018-11-09 10:59:47,601 INFO akka.remote.Remoting Remoting started; listening on addresses :[akka.tcp://mist-info-provider@127.0.0.1:40605]
mist_1 | 2018-11-09 10:59:48,197 INFO io.hydrosphere.mist.master.MasterServer$ FunctionInfoProvider started
mist_1 | 2018-11-09 10:59:48,646 INFO io.hydrosphere.mist.master.MasterServer$ Main service started
mist_1 | 2018-11-09 10:59:49,686 INFO io.hydrosphere.mist.master.MasterServer$ Http interface started
mist_1 | 2018-11-09 10:59:49,692 INFO io.hydrosphere.mist.master.Master$ Mist master started
mist_1 | 2018-11-09 11:00:04,797 INFO io.hydrosphere.mist.master.execution.ContextFrontend Starting executor k8s-master_96a1ce36-460a-4f3b-b8ba-735ddb2a33fe for k8s-master
mist_1 | 2018-11-09 11:00:04,833 INFO io.hydrosphere.mist.master.execution.ContextFrontend Context k8s-master - connected state(active connections: 0, max: 1)
mist_1 | 2018-11-09 11:00:04,845 INFO io.hydrosphere.mist.master.execution.workers.starter.LocalSparkSubmit Try submit local worker k8s-master_96a1ce36-460a-4f3b-b8ba-735ddb2a33fe_1, cmd: /opt/spark/bin/spark-submit --conf spark.eventLog.enabled=true --conf spark.shuffle.service.enabled=true --conf spark.submit.deployMode=cluster --conf spark.master=spark://spark-master:7077 --conf spark.eventLog.dir=/data/spark/events --conf spark.dynamicAllocation.enabled=true --conf spark.eventLog.compress=true --class io.hydrosphere.mist.worker.Worker /opt/mist/mist-worker.jar --master 172.20.0.5:2551 --name k8s-master_96a1ce36-460a-4f3b-b8ba-735ddb2a33fe_1
spark-master_1 | 2018-11-09 11:00:07,315 INFO org.apache.spark.deploy.master.Master Driver submitted org.apache.spark.deploy.worker.DriverWrapper
spark-master_1 | 2018-11-09 11:00:07,318 INFO org.apache.spark.deploy.master.Master Launching driver driver-20181109110007-0000 on worker worker-20181109105941-29c1c06e51e3-9099
spark-worker_1 | 2018-11-09 11:00:07,355 INFO org.apache.spark.deploy.worker.Worker Asked to launch driver driver-20181109110007-0000
spark-worker_1 | 2018-11-09 11:00:07,367 INFO org.apache.spark.deploy.worker.DriverRunner Copying user jar file:/opt/mist/mist-worker.jar to /opt/spark/work/driver-20181109110007-0000/mist-worker.jar
spark-worker_1 | 2018-11-09 11:00:07,390 INFO org.apache.spark.util.Utils Copying /opt/mist/mist-worker.jar to /opt/spark/work/driver-20181109110007-0000/mist-worker.jar
spark-worker_1 | 2018-11-09 11:00:07,400 INFO org.apache.spark.deploy.worker.DriverRunner Killing driver process!
spark-worker_1 | 2018-11-09 11:00:07,404 WARN org.apache.spark.deploy.worker.Worker Driver driver-20181109110007-0000 failed with unrecoverable exception: java.nio.file.NoSuchFileException: /opt/mist/mist-worker.jar
spark-master_1 | 2018-11-09 11:00:07,460 INFO org.apache.spark.deploy.master.Master Removing driver: driver-20181109110007-0000
spark-master_1 | 2018-11-09 11:00:12,769 INFO org.apache.spark.deploy.master.Master 172.20.0.5:40290 got disassociated, removing it.
spark-master_1 | 2018-11-09 11:00:12,770 INFO org.apache.spark.deploy.master.Master 172.20.0.5:42207 got disassociated, removing it.
mist_1 | 2018-11-09 11:00:12,897 ERROR io.hydrosphere.mist.master.execution.workers.ExclusiveConnector Could not start worker connection
mist_1 | java.lang.RuntimeException: Process terminated with error java.lang.RuntimeException: Process exited with status code 255 and out: 2018-11-09 11:00:06,479 WARN org.apache.hadoop.util.NativeCodeLoader Unable to load native-hadoop library for your platform... using builtin-java classes where applicable;2018-11-09 11:00:12,424 ERROR org.apache.spark.deploy.ClientEndpoint Exception from cluster was: java.nio.file.NoSuchFileException: /opt/mist/mist-worker.jar;java.nio.file.NoSuchFileException: /opt/mist/mist-worker.jar; at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86); at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102); at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107); at sun.nio.fs.UnixCopyFile.copy(UnixCopyFile.java:526); at sun.nio.fs.UnixFileSystemProvider.copy(UnixFileSystemProvider.java:253); at java.nio.file.Files.copy(Files.java:1274); at org.apache.spark.util.Utils$.org$apache$spark$util$Utils$$copyRecursive(Utils.scala:664); at org.apache.spark.util.Utils$.copyFile(Utils.scala:635); at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:719); at org.apache.spark.util.Utils$.fetchFile(Utils.scala:509); at org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:155); at org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:173); at org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:92)
mist_1 | at io.hydrosphere.mist.master.execution.workers.WorkerRunner$DefaultRunner$$anonfun$continueSetup$1$1.applyOrElse(WorkerRunner.scala:39)
mist_1 | at io.hydrosphere.mist.master.execution.workers.WorkerRunner$DefaultRunner$$anonfun$continueSetup$1$1.applyOrElse(WorkerRunner.scala:39)
mist_1 | at scala.concurrent.Future$$anonfun$onFailure$1.apply(Future.scala:138)
mist_1 | at scala.concurrent.Future$$anonfun$onFailure$1.apply(Future.scala:136)
mist_1 | at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
mist_1 | at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
mist_1 | at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
mist_1 | at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
mist_1 | at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
mist_1 | at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Job log
INFO 2018-11-09T11:58:22.53 [bec629b4-7cc4-482e-8ccb-9a7856f701d2] Waiting worker connection
INFO 2018-11-09T11:58:22.534 [bec629b4-7cc4-482e-8ccb-9a7856f701d2] InitializedEvent(externalId=None)
INFO 2018-11-09T11:58:22.534 [bec629b4-7cc4-482e-8ccb-9a7856f701d2] QueuedEvent
ERROR 2018-11-09T11:59:02.636 [bec629b4-7cc4-482e-8ccb-9a7856f701d2] FailedEvent with Error:
java.lang.RuntimeException: Context is broken
at io.hydrosphere.mist.master.execution.JobActor$$anonfun$io$hydrosphere$mist$master$execution$JobActor$$initial$1.applyOrElse(JobActor.scala:59)
at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
at io.hydrosphere.mist.master.execution.JobActor.akka$actor$Timers$$super$aroundReceive(JobActor.scala:24)
at akka.actor.Timers$class.aroundReceive(Timers.scala:44)
at io.hydrosphere.mist.master.execution.JobActor.aroundReceive(JobActor.scala:24)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:527)
at akka.actor.ActorCell.invoke(ActorCell.scala:496)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.RuntimeException: Process terminated with error java.lang.RuntimeException: Process exited with status code 255 and out: 2018-11-09 11:58:56,046 WARN org.apache.hadoop.util.NativeCodeLoader Unable to load native-hadoop library for your platform... using builtin-java classes where applicable;2018-11-09 11:59:01,870 ERROR org.apache.spark.deploy.ClientEndpoint Exception from cluster was: java.nio.file.NoSuchFileException: /opt/mist/mist-worker.jar;java.nio.file.NoSuchFileException: /opt/mist/mist-worker.jar; at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86); at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102); at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107); at sun.nio.fs.UnixCopyFile.copy(UnixCopyFile.java:526); at sun.nio.fs.UnixFileSystemProvider.copy(UnixFileSystemProvider.java:253); at java.nio.file.Files.copy(Files.java:1274); at org.apache.spark.util.Utils$.org$apache$spark$util$Utils$$copyRecursive(Utils.scala:664); at org.apache.spark.util.Utils$.copyFile(Utils.scala:635); at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:719); at org.apache.spark.util.Utils$.fetchFile(Utils.scala:509); at org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:155); at org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:173);at org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:92)
at io.hydrosphere.mist.master.execution.workers.WorkerRunner$DefaultRunner$$anonfun$continueSetup$1$1.applyOrElse(WorkerRunner.scala:39)
at io.hydrosphere.mist.master.execution.workers.WorkerRunner$DefaultRunner$$anonfun$continueSetup$1$1.applyOrElse(WorkerRunner.scala:39)
at scala.concurrent.Future$$anonfun$onFailure$1.apply(Future.scala:138)
at scala.concurrent.Future$$anonfun$onFailure$1.apply(Future.scala:136)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Local worker log
2018-11-09 11:58:24,154 WARN org.apache.hadoop.util.NativeCodeLoader Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2018-11-09 11:58:30,109 ERROR org.apache.spark.deploy.ClientEndpoint Exception from cluster was: java.nio.file.NoSuchFileException: /opt/mist/mist-worker.jar
java.nio.file.NoSuchFileException: /opt/mist/mist-worker.jar
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixCopyFile.copy(UnixCopyFile.java:526)
at sun.nio.fs.UnixFileSystemProvider.copy(UnixFileSystemProvider.java:253)
at java.nio.file.Files.copy(Files.java:1274)
at org.apache.spark.util.Utils$.org$apache$spark$util$Utils$$copyRecursive(Utils.scala:664)
at org.apache.spark.util.Utils$.copyFile(Utils.scala:635)
at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:719)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:509)
at org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:155)
at org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:173)
at org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:92)
Suspicious
Seem hard code when using $MIST_HOME
for folder path to mist_worker.jar
on spark worker
org.apache.spark.deploy.worker.DriverRunner Copying user jar file:/opt/mist/mist-worker.jar to /opt/spark/work/driver-20181109110007-0000/mist-worker.jar
org.apache.spark.deploy.worker.Worker Driver driver-20181109110007-0000 failed with unrecoverable exception: java.nio.file.NoSuchFileException: /opt/mist/mist-worker.jar
Issue Analytics
- State:
- Created 5 years ago
- Comments:11 (6 by maintainers)
Top Results From Across the Web
Unable to Execute More than a spark Job "Initial job has not ...
This error message means that your application is requesting more resources from the cluster than the cluster can currently provide i.e. more cores...
Read more >Workfusion studio 9.2 failed to execute bot config file
... in local and run a test config file, it throws below exception. It looks like missing specific jar file. It doesn't mentioned...
Read more >What the future of work will mean for jobs, skills, and wages
Jobs lost, jobs gained: What the future of work will mean for jobs, skills, and wages ... What will automation mean for skills...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@zero-88 Done!
Awesome, works like a charm. Please make a new release version when you have a time. Thanks