Consider defaulting k8s_api_threadpool_workers to c.JupyterHub.concurrent_spawn_limit
See original GitHub issuec.KubeSpawner.k8s_api_threadpool_workers
defaults to 5*ncpu [1] which is what a ThreadPoolExecutor in python defaults to as well [2]. The description of that option says:
Increase this if you are dealing with a very large number of users.
In our setup the core
node where the hub pod runs is a 4CPU node because the hub doesn’t go beyond 1CPU. This means that by default k8s_api_threadpool_workers
only has 20 workers.
The c.JupyterHub.concurrent_spawn_limit
option defaults to 100 [3] but in zero-to-jupyterhub-k8s is set to 64 [4].
It seems that if you have a lot of users logging in and spawning notebook pods at the same time, like at the beginning of a large user event, you would want k8s_api_threadpool_workers
aligned with concurrent_spawn_limit
otherwise those spawn requests could be waiting on the thread pool.
We could default k8s_api_threadpool_workers
to concurrent_spawn_limit
or at least mention the relationship in the config option help docs between the two options.
[1] https://github.com/jupyterhub/kubespawner/blob/5521d573c272/kubespawner/spawner.py#L199 [2] https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor [3] https://jupyterhub.readthedocs.io/en/stable/api/app.html#jupyterhub.app.JupyterHub.concurrent_spawn_limit [4] https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/e4b9ce7eab5c17325e93975de1d6b4a200d47cd8/jupyterhub/values.yaml#L16
Issue Analytics
- State:
- Created 3 years ago
- Reactions:4
- Comments:9 (6 by maintainers)
Top GitHub Comments
To decide if the thread pool size needs to be bigger or not I think we need a measurement off how many requests to use the threadpool end up having to wait. My intuition is similar to Erik’s in that I think each spawn should only use a slot in the threadpool for a second or so while it is sending the POST request. Maybe that isn’t true though which is where some measurements of how long these requests take and how often a request ends up getting queued before being executed would help.
@consideRatio thanks for digging into this in detail.
I did change
c.KubeSpawner.k8s_api_threadpool_workers
to matchc.JupyterHub.concurrent_spawn_limit
in our z2jhextraConfig
value like this:c.KubeSpawner.k8s_api_threadpool_workers = c.JupyterHub.concurrent_spawn_limit
I’m assuming that worked since the hub started up fine but I’m not sure if the value was actually assigned correctly since I don’t know how to dump the hub’s settings at runtime [1].
Assuming it was correctly configured, I ran a load testing script to create 400 users (
POST /users
), start the user notebook servers (pods) in batches of 10 (using a ThreadPoolExecutor since thePOST /users/{name}/server
API can take a bit, about ~7-10 seconds in our environment), and then wait for them to beready: True
.Comparing times between having
c.KubeSpawner.k8s_api_threadpool_workers
at the default (20 for us on a 4CPUcore
node) and then set toc.JupyterHub.concurrent_spawn_limit
(64 per z2jh), it was slightly faster but only about 3% which is probably in the margin of error; I’m guessing if I ran both scenarios more times and averaged them out the gain wouldn’t be very noticeable. This likely reinforces the idea that the thread pool size is not an issue.As for how this could be measured, I’m not really sure how to measure the time spent waiting in the pool for a Future to be executed. It might be possible to track overall time spent for the Future by using add_done_callback and passing in a partial function which has a start time and then calculates the end time when the callback is called to get the overall time spent for the Future, but that wouldn’t really tell us how long the Future is sitting in the pool, though it could be a reasonable warning flag if you set some threshold and log a warning if a request took x number of seconds to complete. I don’t see an easy way to track wait time in the thread pool from the standard library and sub-classing ThreadPoolExecutor to time things doesn’t seem like much fun either (guess it depends on your idea of fun). Other ideas?
[1] https://discourse.jupyter.org/t/is-there-a-way-to-dump-hub-app-settings-config/5305