Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Shouldn't KubeSpawner's cmd/args map to the K8s container's command/args?

See original GitHub issue

I saw that c.KubeSpawner.default_url was reported to not have an impact and decided to investigate.

I end up concluding that configuring the command that the container starts with is very complicated. The Spawner, K8s, and the Dockerfile all have various names for similar things that all combine into a k8s containers args field where the command field apparently isn’t configurable to my surprise.

cmd, args, notebook_dir, default_url, and get_args(...) are defined in in the Spawner base class
get_args(...) combines notebook_dir, default_url, and args into a single appendix to cmd. There are changes to this in JupyterHub version 2.0.0 though, see the get_args definition.

KubeSpawner sets the containers k8s field args to the variable real_cmd decided below, which changed in 5ee6dc6f. Note that in k8s manifests command and args are the equivalent of Dockerfiles entrypoint and cmd.

    async def get_pod_manifest(self):
        # ...
        if self.cmd:
            real_cmd = self.cmd + self.get_args()
        else:
            # change commit comment:
            # fix use of get_args() (calling get_args() alone would omit the command)
-           real_cmd = self.get_args()
+           real_cmd = None

Z2JH’s default config of KubeSpawner’s cmd is [] for z2jh >= 2.0.0 but jupyterhub-singleuser for z2jh < 2.0.0. This means that by default, z2jh 2.0.0+ will expose this bug by default and silently ignore args, notebook_dir, or default_url unless cmd is set explicitly as well. (Related PR: https://github.com/jupyterhub/zero-to-jupyterhub-k8s/pull/2449)

Investigation conclusion

I’m confused we opt to set args of the k8s container based on the traitlets cmd, args, notebook_dir, default_url instead of setting command to cmd, and setting args to default_url, notebook_dir and args.
I think the logic to silently ignore the traitlets args, notebook_dir, default_url when cmd is unset is problematic and should lead to a warning at least.

Issue Analytics

State:
Created 3 years ago
Comments:13 (10 by maintainers)

Top GitHub Comments

1reaction

minrkcommented, May 3, 2021

It is extremely confusing that Kubernetes renames Docker’s Entrypoint and Command as Command and Args, respectively!

While Kubernetes’ names more accurately describe what happens (a command is passed arguments), I think the docker names more accurately describe what they are for (an entrypoint prepares an environment in which to launch a command).

In that sense, cmd and args are used to build a single command list, which we pass to kubernetes container arguments, just like we would with docker run $image command ... and I think we should never interact with the entrypoint at all, other than requiring that it allow arbitrary commands.

This is part of why I think things like docker stacks should implement their environment customization in entrypoint instead of CMD. Overriding CMD shouldn’t be hugely consequential, but it is for docker-stacks.

I think what we must support is opt-in/out-out of get_args(), and this can be in three modes:

always append if cmd is set (current KubeSpawner behavior)

always append to cmd independent if its set or not (Dockerfile CMD will never be considered)

never append to cmd independent if its set or not

I’d add that option 2 would also require restoring the default cmd = 'jupyterhub-singleuser' because it doesn’t make sense to append args without starting with the command itself. This would also make KubeSpawner consistent with all the non-container-based Spawners.

I wonder if perhaps the Spawner base class should add some configuration about this though rather than KubeSpawner, and then we make use and adjust around that?

I think this is really a KubeSpawner-specific issue, and should probably be a kubespawner option. It is really only the “get the default command from the image with kubespawner” case that is affected, because kubespawner cannot actually retrieve this info from the image. If the command is specified, everything works.

The Spawner API is that cmd ultimately launches jupyterhub-singleuser, and args will be passed to it. It is part of the API that this command accepts the CLI args of jupyterhub-singleuser, even if it may be a wrapper or adapter. Launching a command other than that is not currently supported by JupyterHub. What makes KubeSpawner (and DockerSpawner) unique and require special handling is that they have another source for the default command: the CMD field of the image itself. Dockerspawner deals with this by inspecting the image to extract CMD before appending arguments, if Spawner.cmd is unset. I don’t know how feasible that would be in KubeSpawner (we’d have to deal with all kinds of registry and pull secret stuff to get it, I think). So I’m not sure what we can do to support that here. Maybe this means using the image’s CMD by default is not something that’s feasible in KubeSpawner.

This is related to why I opened https://github.com/jupyterhub/jupyterhub/pull/3381 - it would be nice if we could stop specifying CLI args (by default) in Spawners, and do everything through the env, but it’s trickier to remove than I realized. I would like to get to a point where JupyterHub internally does not turn any options into CLI args, other than explicit CLI args from user config, and only communicates options through the environment. Then this problem would go away, I think…

0reactions

minrkcommented, Jun 7, 2022

Do we want KubeSpawner to override cmd?

The default is less important to me; what’s important to me is that the “use the image’s command” case is clear and simple. If we can accomplish that, however we accomplish that, I’m okay with switching back to jupyterhub-singleuser to match other Spawners.

Whatever we do with the default, I think it’s probably still better for the z2jh approach to specify either the full singleuser.cmd, or whatever special value (None, empty list, ‘$imageCommand’), but not expose (via documentation / schema) Spawner.args due to the fact that it only works when singleuser.cmd is also specified.

Why do we set c.Spawner.cmd and c.Spawner.default_url instead of c.KubeSpawner.cmd and c.KubeSpawner.default_url in z2jh’s jupyterhub_config.py?

It doesn’t make a difference in our case, but it’s typical to set config either on the class that defines the trait, or on the subclass you know you are using. It makes no difference when there is only one class you are configuring, but if another class came up that was a sibling of KubeSpawner (inherits from Spawner but not KubeSpawner), there is a distinction:

Setting it on Spawner would affect both KubeSpawner and CustomSpawner
Setting on KubeSpawner would only affect KubeSpawner, not CustomSpawner

That’s mostly hypothetical in the current situation where there’s really only one class to configure (or possible custom subclasses of KubeSpaner), so if things are clearer to set them on KubeSpawner only, that’s okay, too. I would typically apply config to the class that defines the trait, though.

Traitlets config loading looks through the class inheritance, so it’ll find it on Spawner, KubeSpawner, or anything in between that has the trait (if there were something in between).

Is there an action point to take pre z2jh 2.0.0 release?

I guess we need to decide if we want to revert the “use image by default” change. If not, then document the two choices (singleuser.cmd or ‘default’).

If we go back to jupyterhub-singleuser, then we need to revert the change, and make sure to implement and document the ‘use the image’ option, since that not working is what started the whole process.