question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Service not found error preventing spawn

See original GitHub issue

I installed a jupterhub about one year ago, and it was working well. The hub runs in a docker, as the nginx proxy that allows an https access. The other nodes are managed in a docker swarm. The users are authenticated using CAS. The home directories of the users are stored on a nfs-mounted volume.

Randomly, users can’t login, and the log shows an error 403:

[W 2019-10-18 11:30:53.464 JupyterHub log:174] 403 POST /hub/api/users/some-user/activity

I should say that the hub has been booted hours ago, starting from an empty jupyterhub.sqlite file, never restarted, and the sqlite file left untouched. The error seems to appear more frequently when a user stops and restarts his server. The configuration is the same for all users, but the error is frequent for some users, rare for other users. Moreover, the error seems to disappear when I run the hub without any worker nodes in the swarm (all spawned servers running on the hub’s host). My own account (admin) seems not to be affected by the error.

Note: I submited this issue to the jupyterhub/jupyterhub gitlab project. minrk answered:

The 403 errors here are caused by the API request made by single-user servers to register activity. Where there’s definitely a bug (in SwarmSpawner, it looks like), is in the fact we see logs of the Spawner claiming that the service was not started, but it clearly was because it is making API requests to the Hub.

The sequence of events:

SwarmSpawner launches the service
(bug somewhere) SwarmSpawner believes this has failed, but it has not.
Hub begins cleanup of server, including revoking credentials for the API token allocated to the server.
the service finishes starting, and starts making API requests, but its token has been revoked in step 3, resulting in 403.

So the 403 is a symptom, but the real error is the SwarmSpawner is starting servers, but it thinks it is failing somehow.

| [I 2019-10-22 12:40:53.176 JupyterHub base:812] User userc-testc took 53.770 seconds to start

the error seems to be related to the following sequence of events:

 [D 2019-10-18 11:29:59.979 JupyterHub user:542] Calling Spawner.start for kaniav-kamary
 [D 2019-10-18 11:29:59.988 JupyterHub dockerspawner:813] Getting container 'jupyter-kaniav-kamary' for dockerspawner::start before remove id:
 [I 2019-10-18 11:29:59.994 JupyterHub dockerspawner:820] Service 'jupyter-kaniav-kamary' is gone
 [I 2019-10-18 11:30:00.016 JupyterHub dockerspawner:1030] Created service jupyter-kaniav-kamary (id: ge4p07y) from image 160.228.22.168:5000/hdlbq/cs-notebook-r
 [I 2019-10-18 11:30:00.017 JupyterHub dockerspawner:1053] Starting service jupyter-kaniav-kamary (id: ge4p07y)
 [D 2019-10-18 11:30:00.017 JupyterHub swarmspawner:144] Getting task of service 'jupyter-kaniav-kamary'
| [D 2019-10-18 11:30:00.017 JupyterHub dockerspawner:813] Getting container 'jupyter-kaniav-kamary' for swarmspawner::get_task id:ge4p07y
| [D 2019-10-18 11:30:00.026 JupyterHub swarmspawner:256] Service ge4p07y state: pending
| [I 2019-10-18 11:30:00.920 JupyterHub log:174] 302 GET /hub/spawn -> /hub/spawn-pending/kaniav-kamary (kaniav-kamary@10.255.0.2) 1019.67ms
| [I 2019-10-18 11:30:00.960 JupyterHub pages:303] kaniav-kamary is pending spawn
| [I 2019-10-18 11:30:00.962 JupyterHub log:174] 200 GET /hub/spawn-pending/kaniav-kamary (kaniav-kamary@10.255.0.2) 18.15ms
| [D 2019-10-18 11:30:01.027 JupyterHub swarmspawner:144] Getting task of service 'jupyter-kaniav-kamary'
| [D 2019-10-18 11:30:01.028 JupyterHub dockerspawner:813] Getting container 'jupyter-kaniav-kamary' for swarmspawner::get_task id:ge4p07y
| [E 2019-10-18 11:30:01.036 JupyterHub user:626] Unhandled error starting kaniav-kamary's server: Service jupyter-kaniav-kamary not found
| [D 2019-10-18 11:30:01.036 JupyterHub user:724] Stopping kaniav-kamary
| [D 2019-10-18 11:30:01.036 JupyterHub swarmspawner:144] Getting task of service 'jupyter-kaniav-kamary'
| [D 2019-10-18 11:30:01.037 JupyterHub dockerspawner:813] Getting container 'jupyter-kaniav-kamary' for swarmspawner::get_task id:ge4p07y
| [W 2019-10-18 11:30:01.044 JupyterHub swarmspawner:128] Service jupyter-kaniav-kamary not found
| [D 2019-10-18 11:30:01.057 JupyterHub user:752] Deleting oauth client jupyterhub-user-kaniav-kamary
| [D 2019-10-18 11:30:01.069 JupyterHub user:755] Finished stopping kaniav-kamary
| ERROR:asyncio:Task exception was never retrieved
| future: <Task finished coro=<BaseHandler.spawn_single_user() done, defined at /opt/conda/lib/python3.6/site-packages/jupyterhub/handlers/base.py:697> exception=RuntimeError('Service jupyter-kaniav-kamary not found',)>
| Traceback (most recent call last):
|   File "/opt/conda/lib/python3.6/site-packages/jupyterhub/handlers/base.py", line 889, in spawn_single_user
|     timedelta(seconds=self.slow_spawn_timeout), finish_spawn_future
|   File "/opt/conda/lib/python3.6/site-packages/jupyterhub/handlers/base.py", line 807, in finish_user_spawn
|     await spawn_future
|   File "/opt/conda/lib/python3.6/site-packages/jupyterhub/user.py", line 642, in spawn
|     raise e
|   File "/opt/conda/lib/python3.6/site-packages/jupyterhub/user.py", line 546, in spawn
|     url = await gen.with_timeout(timedelta(seconds=spawner.start_timeout), f)
|   File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 736, in run
|     yielded = self.gen.throw(*exc_info)  # type: ignore
|   File "/opt/conda/lib/python3.6/site-packages/dockerspawner/dockerspawner.py", line 1057, in start
|     yield self.start_object()
|   File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 729, in run
|   File "/opt/conda/lib/python3.6/site-packages/tornado/gen.py", line 742, in run
|     yielded = self.gen.send(value)
|   File "/opt/conda/lib/python3.6/site-packages/dockerspawner/swarmspawner.py", line 252, in start_object
|     raise RuntimeError("Service %s not found" % self.service_name)
| RuntimeError: Service jupyter-kaniav-kamary not found
| [I 2019-10-18 11:30:01.107 JupyterHub log:174] 200 GET /hub/api/users/kaniav-kamary/server/progress (kaniav-kamary@10.255.0.2) 16.13ms
| [W 2019-10-18 11:30:53.464 JupyterHub log:174] 403 POST /hub/api/users/kaniav-kamary/activity (@10.0.0.48) 3.26ms
| [W 2019-10-18 11:30:54.423 JupyterHub log:174] 403 POST /hub/api/users/kaniav-kamary/activity (@10.0.0.48) 3.25ms
| [W 2019-10-18 11:30:55.849 JupyterHub log:174] 403 POST /hub/api/users/kaniav-kamary/activity (@10.0.0.48) 3.77ms
| [W 2019-10-18 11:30:59.051 JupyterHub log:174] 403 POST /hub/api/users/kaniav-kamary/activity (@10.0.0.48) 3.36ms
Debian 9.11
Docker version 19.03.3, build a872fc2f86
jupyterhub docker version: latest
nginx docker version: latest

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:7 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
hdlbqcommented, Nov 8, 2019

Hi, Yes, you’re right. It seems that the error occurs:

  • more frequently if you have two swarm managers
  • very rarely if you don’t have additional node in the swarm (just the hub) I think that the error is linked to the responsiveness of the swarm. I will investigate deeper, but it is rather hard since I have only one “big” cluster, and a two-vm test cluster, with only two users: me and a test account. Moreover, retro-engineering the code in a context of asynchronous calls (that leads to an arbitrary order of log lines) is not easy.
1reaction
hdlbqcommented, Oct 28, 2019

Hello, this issue seems to be linked to the issue #330:“Timeout potentially due to docker inspect_container prior to creation”. I think I solved it by applying the patch suggested by gdbassett on 27 Aug.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to fix mobs not spawning on your Minecraft Server
There are quite a few reasons why mobs aren't spawning naturally on your Minecraft server. We've listed a few of the most common...
Read more >
what causes “systemd: Failed at step USER spawning /usr ...
I've just ran into this and in my case it was caused by quoting a user name in my service file: [Unit] Description=Demonstrate...
Read more >
Startup service appears to be running, but process doesn't ...
I am setting up ElasticSearch on a Ubuntu VM and am running into some issues when settings it up to be a startup...
Read more >
bash script - spawn, send, interact - commands not found error
You're invoking the script badly. If you say: bash scriptname. then the #! line is ignored and bash takes the file as though...
Read more >
npm run server throws error sh: 1: vue-cli-service: not found
Probably you have a problem with your babel.config.js file. Open it and chek out the path!
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found