question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Timeout potentially due to docker inspect_container prior to creation

See original GitHub issue

I have a jupyterhub environment using swarmspawner. It works fine in my local testing, but when deployed to my rather old and slow production environment, it fails with the attached logs. However, when I look at the services, the service is up and running successfully. If I trace it back, it seems to go https://github.com/jupyterhub/dockerspawner/blob/9d4a35995d2c2dd992e070cc7ad260123308b606/dockerspawner/swarmspawner.py#L252 through get_task() to https://github.com/jupyterhub/dockerspawner/blob/9d4a35995d2c2dd992e070cc7ad260123308b606/dockerspawner/dockerspawner.py#L781 which calls inspect_service() from https://docker-py.readthedocs.io/en/stable/api.html#module-docker.api.service. If I manually run inspect_service() and tasks() I get back a running service.

However, inspecting during startup, I get this:

>>> for task in client.tasks(filters={"service": "jupyter-gdb"}):
...    print(task['Status']['State'])
... 
rejected
rejected
>>> for task in client.tasks(filters={"service": "jupyter-gdb"}):
...    print(task['Status']['State'])
... 
rejected
rejected
rejected
>>> for task in client.tasks(filters={"service": "jupyter-gdb"}):
...    print(task['Status']['State'])
... 
rejected
rejected
rejected
rejected
>>> for task in client.tasks(filters={"service": "jupyter-gdb"}):
...    print(task['Status']['State'])
... 
rejected
rejected
rejected
rejected
rejected
>>> for task in client.tasks(filters={"service": "jupyter-gdb"}):
...    print(task['Status']['State'])
... 
rejected
rejected
rejected
ready
rejected
>>> for task in client.tasks(filters={"service": "jupyter-gdb"}):
...    print(task['Status']['State'])
... 
rejected
rejected
rejected
running
rejected

For some reason, it appears as if the task gets rejected for a while before running and the swarmspawner picks this up as a failure. I wonder if it has to do https://github.com/jupyterhub/dockerspawner/blob/9d4a35995d2c2dd992e070cc7ad260123308b606/dockerspawner/swarmspawner.py#L257 checking ‘State’ instead of ‘Message’, as Message is ‘preparing’ when State is ‘rejected’:

  'Status': {'ContainerStatus': {},
             'Err': 'No such image: '
                    '<image name and hash>',
             'Message': 'preparing',
             'PortStatus': {},
             'State': 'rejected',
             'Timestamp': '2019-08-27T01:56:24.483764876Z'}

vs

  'Status': {'ContainerStatus': {'ContainerID': '<container id>',
                                 'PID': 3954},
             'Message': 'started',
             'PortStatus': {},
             'State': 'running',
             'Timestamp': '2019-08-27T01:56:34.956884935Z'},

Logs:

[D 2019-08-26 22:54:17.683 JupyterHub pages:165] Triggering spawn with default options for gdb
[D 2019-08-26 22:54:17.683 JupyterHub base:780] Initiating spawn for gdb
[D 2019-08-26 22:54:17.683 JupyterHub base:787] 0/100 concurrent spawns
[D 2019-08-26 22:54:17.683 JupyterHub base:792] 0/100 active servers
[D 2019-08-26 22:54:17.709 JupyterHub user:542] Calling Spawner.start for gdb
[W 2019-08-26 22:54:17.711 JupyterHub base:900] User gdb is slow to start (timeout=0)
[I 2019-08-26 22:54:17.712 JupyterHub log:174] 302 GET /hub/spawn -> /hub/spawn-pending/gdb (gdb@10.255.0.3) 34.88ms
[D 2019-08-26 22:54:17.726 JupyterHub dockerspawner:777] Getting container 'jupyter-gdb'
[I 2019-08-26 22:54:17.729 JupyterHub dockerspawner:784] Service 'jupyter-gdb' is gone
[I 2019-08-26 22:54:17.756 JupyterHub dockerspawner:990] Created service jupyter-gdb (id: q70k420) from image <image_name>
[I 2019-08-26 22:54:17.756 JupyterHub dockerspawner:1013] Starting service jupyter-gdb (id: q70k420)
[D 2019-08-26 22:54:17.756 JupyterHub swarmspawner:144] Getting task of service 'jupyter-gdb'
[D 2019-08-26 22:54:17.756 JupyterHub dockerspawner:777] Getting container 'jupyter-gdb'
[D 2019-08-26 22:54:17.764 JupyterHub swarmspawner:256] Service q70k420 state: pending
[I 2019-08-26 22:54:17.814 JupyterHub pages:303] gdb is pending spawn
[I 2019-08-26 22:54:17.818 JupyterHub log:174] 200 GET /hub/spawn-pending/gdb (gdb@<IP>) 10.49ms
[D 2019-08-26 22:54:18.765 JupyterHub swarmspawner:144] Getting task of service 'jupyter-gdb'
[D 2019-08-26 22:54:18.765 JupyterHub dockerspawner:777] Getting container 'jupyter-gdb'
[E 2019-08-26 22:54:18.771 JupyterHub user:626] Unhandled error starting gdb's server: Service jupyter-gdb not found
[D 2019-08-26 22:54:18.771 JupyterHub user:724] Stopping gdb
[D 2019-08-26 22:54:18.771 JupyterHub swarmspawner:144] Getting task of service 'jupyter-gdb'
[D 2019-08-26 22:54:18.772 JupyterHub dockerspawner:777] Getting container 'jupyter-gdb'
[W 2019-08-26 22:54:18.778 JupyterHub swarmspawner:128] Service jupyter-gdb not found
[D 2019-08-26 22:54:18.785 JupyterHub user:752] Deleting oauth client jupyterhub-user-gdb
[D 2019-08-26 22:54:18.793 JupyterHub user:755] Finished stopping gdb
[E 2019-08-26 22:54:18.797 JupyterHub gen:593] Exception in Future <Task finished coro=<BaseHandler.spawn_single_user.<locals>.finish_user_spawn() done, defined at /opt/conda/lib/python3.6/site-packages/jupyterhub/handlers/base.py:800> exception=RuntimeError('Service jupyter-gdb not found',)> after timeout
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 589, in error_callback
        future.result()
      File "/opt/conda/lib/python3.6/site-packages/jupyterhub/handlers/base.py", line 807, in finish_user_spawn
        await spawn_future
      File "/opt/conda/lib/python3.6/site-packages/jupyterhub/user.py", line 642, in spawn
        raise e
      File "/opt/conda/lib/python3.6/site-packages/jupyterhub/user.py", line 546, in spawn
        url = await gen.with_timeout(timedelta(seconds=spawner.start_timeout), f)
      File "/opt/conda/lib/python3.6/site-packages/dockerspawner/dockerspawner.py", line 1017, in start
        yield self.start_object()
      File "/opt/conda/lib/python3.6/site-packages/dockerspawner/swarmspawner.py", line 252, in start_object
        raise RuntimeError("Service %s not found" % self.service_name)
    RuntimeError: Service jupyter-gdb not found

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

3reactions
gdbassettcommented, Sep 19, 2019

@Wildcarde Here’s what I did. Pardon all the extra stuff to make it easy to load. I’m sure there’s a better way, but this is what got the job done:

swarmspawnergdb.zip

2reactions
gdbassettcommented, Aug 27, 2019

Here’s a patch I generated. I subclassed swarmspawner with these edits to get it to work in my environment.

--- swarmspawner.py	2019-08-27 16:21:16.000000000 -0500
+++ swarmspawner.new.py	2019-08-27 16:22:21.000000000 -0500
@@ -151,7 +151,12 @@
                 filters={"service": self.service_name, "desired-state": "running"},
             )
             if len(tasks) == 0:
-                return None
+                tasks = yield self.docker(
+                    "tasks",
+                    filters={"service": self.service_name},
+                )
+                if len(tasks) == 0:
+                    return None
 
             elif len(tasks) > 1:
                 raise RuntimeError(
@@ -254,7 +259,7 @@
             status = service["Status"]
             state = status["State"].lower()
             self.log.debug("Service %s state: %s", self.service_id[:7], state)
-            if state in {"new", "assigned", "accepted", "starting", "pending", "preparing"}:
+            if state in {"new", "assigned", "accepted", "starting", "pending", "preparing", "rejected"}:
                 # not ready yet, wait before checking again
                 yield gen.sleep(dt)
                 # exponential backoff
Read more comments on GitHub >

github_iconTop Results From Across the Web

Add timeout option to docker run · Issue #1905 - GitHub
This is a feature request to add a timeout option when calling the docker run command. The option called be -to=5000 (5000 ms)....
Read more >
Docker Compose Wait til dependency container is fully up ...
I've used the depends_on key, but the service with the dependency launches prior to the depending service being completely up. version: '3' ...
Read more >
docker run - Docker Documentation
That is, docker run is equivalent to the API /containers/create then ... --health-timeout, Maximum time to allow one check to run (ms|s|m|h) (default...
Read more >
dockerd - Docker Documentation
dockerd is the persistent process that manages containers. Docker uses different binaries for the daemon and client. To run the daemon you type...
Read more >
docker inspect - Docker Documentation
By default, `docker inspect` will render results in a JSON array. ... --size , -s, Display total file sizes if the type is...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found