[sdk] Can't use create_component_from_func with pip packages when running as non-root
See original GitHub issueEnvironment
- KFP version: 1.7 (KF 1.4)
- KFP SDK version: 1.6.6
- All dependencies version:
kfp 1.6.6
kfp-pipeline-spec 0.1.13
kfp-server-api 1.7.1
Steps to reproduce
Background:
- Due to security concerns it’s a bad idea to run containers as root.
- For composability and maintenance, it’s a good idea to define small & modular KFP components.
With this in mind, I wish to report that create_components_from_func
does not work as expected when the container is run as a non-root user and when the packages_to_install
parameter is used to add some runtime dependencies.
To reproduce, see attached pipeline definition at the bottom.
When this pipeline is run, the following output is seen in Kubeflow:
WARNING: The directory '/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag.
ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied: '/.local'
Check the permissions.
WARNING: The directory '/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag.
ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied: '/.local'
Check the permissions.
Error: exit status 1
Expected result
The correct behaviour here would be for packages to be installed in a location that’s writable by non-root users. As a direct consequence, that location would also to have to be added to PYTHONPATH.
With the attached pipeline definition, kfp.components._python_op._get_packages_to_install_command
today produces the following yaml:
- sh
- -c
- (PIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet --no-warn-script-location
'tqdm' || PIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet --no-warn-script-location
'tqdm' --user) && "$0" "$@"
- sh
- -ec
- |
program_path=$(mktemp)
printf "%s" "$0" > "$program_path"
python3 -u "$program_path" "$@"
- |
def hello_world():
import tqdm
print("Hello world!")
import argparse
_parser = argparse.ArgumentParser(prog='Hello world', description='')
_parsed_args = vars(_parser.parse_args())
_outputs = hello_world(**_parsed_args)
I propose that kfp.components._python_op._get_packages_to_install_command
is changed to instead output:
- sh
- -c
- (PIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet --no-warn-script-location
'tqdm' || PIP_DISABLE_PIP_VERSION_CHECK=1 PYTHONUSERBASE=/tmp/pip python3 -m pip install --quiet --no-warn-script-location --cache-dir /tmp/pip-cache
'tqdm' --user) && "$0" "$@"
- sh
- -ec
- |
PIP_CUSTOM_FOLDER=$(realpath /tmp/pip/lib/*/site-packages)
program_path=$(mktemp)
printf "%s" "$0" > "$program_path"
PYTHONPATH=$PYTHONPATH:$PIP_CUSTOM_FOLDER python3 -u "$program_path" "$@"
- |
def hello_world():
import tqdm
print("Hello world")
import argparse
_parser = argparse.ArgumentParser(prog='Hello world', description='')
_parsed_args = vars(_parser.parse_args())
_outputs = hello_world(**_parsed_args)
This change would accomplish two things:
- Allow non-root users to install pip packages on the fly
- Allow non-root users to install packages from cache
A side note: I don’t have the historical context of why KFP first tries to install packages as root and on failure as the current user with --user
, IMO doing it with --user
from the beginning would make more sense. But might be missing something 😃
If you agree with the structure of my proposal, I can work on the change - seems like a pretty small fix.
Thanks!
Materials and Reference
Pipeline definition
import argparse
import kfp
import kubernetes
def hello_world():
import tqdm
print("Hello world!")
def hello_world_op():
return kfp.components.create_component_from_func(func=hello_world, packages_to_install=['tqdm'])()
def pipeline():
component = hello_world_op()
user_sc = kubernetes.client.models.V1SecurityContext(run_as_user=1234)
component.set_security_context(user_sc)
def get_args():
parser = argparse.ArgumentParser()
parser.add_argument('--yaml', type=str, required=True)
return parser.parse_args()
def main():
args = get_args()
kfp.compiler.Compiler().compile(
pipeline_func=pipeline,
package_path=args.yaml)
if __name__ == "__main__":
main()
print(kfp.__version__)
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:7
- Comments:9 (3 by maintainers)
Top GitHub Comments
I’ve got a fix for this now, but I’ll need to go through an approval process at work due to the CLA before I can contribute.
Contribution process has been completed. I’ll try to pick this up when I find the time (note: might take a while). In the meantime, feel free to ping me if there’s anything I can clarify