question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[sdk] Can't use create_component_from_func with pip packages when running as non-root

See original GitHub issue

Environment

  • KFP version: 1.7 (KF 1.4)
  • KFP SDK version: 1.6.6
  • All dependencies version:
kfp                              1.6.6
kfp-pipeline-spec                0.1.13
kfp-server-api                   1.7.1

Steps to reproduce

Background:

  • Due to security concerns it’s a bad idea to run containers as root.
  • For composability and maintenance, it’s a good idea to define small & modular KFP components.

With this in mind, I wish to report that create_components_from_func does not work as expected when the container is run as a non-root user and when the packages_to_install parameter is used to add some runtime dependencies.

To reproduce, see attached pipeline definition at the bottom.

When this pipeline is run, the following output is seen in Kubeflow:

WARNING: The directory '/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag.
ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied: '/.local'
Check the permissions.
WARNING: The directory '/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag.
ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied: '/.local'
Check the permissions.
Error: exit status 1

Expected result

The correct behaviour here would be for packages to be installed in a location that’s writable by non-root users. As a direct consequence, that location would also to have to be added to PYTHONPATH.

With the attached pipeline definition, kfp.components._python_op._get_packages_to_install_command today produces the following yaml:

      - sh
      - -c
      - (PIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet --no-warn-script-location
        'tqdm' || PIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet --no-warn-script-location
        'tqdm' --user) && "$0" "$@"
      - sh
      - -ec
      - |
        program_path=$(mktemp)
        printf "%s" "$0" > "$program_path"
        python3 -u "$program_path" "$@"
      - |
        def hello_world():
            import tqdm
            print("Hello world!")

        import argparse
        _parser = argparse.ArgumentParser(prog='Hello world', description='')
        _parsed_args = vars(_parser.parse_args())

        _outputs = hello_world(**_parsed_args)

I propose that kfp.components._python_op._get_packages_to_install_command is changed to instead output:

      - sh
      - -c
      - (PIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet --no-warn-script-location
        'tqdm' || PIP_DISABLE_PIP_VERSION_CHECK=1 PYTHONUSERBASE=/tmp/pip python3 -m pip install --quiet --no-warn-script-location --cache-dir /tmp/pip-cache
        'tqdm' --user) && "$0" "$@"
      - sh
      - -ec
      - |
        PIP_CUSTOM_FOLDER=$(realpath /tmp/pip/lib/*/site-packages)
        program_path=$(mktemp)
        printf "%s" "$0" > "$program_path"
        PYTHONPATH=$PYTHONPATH:$PIP_CUSTOM_FOLDER python3 -u "$program_path" "$@"
      - |
        def hello_world():
            import tqdm
            print("Hello world")

        import argparse
        _parser = argparse.ArgumentParser(prog='Hello world', description='')
        _parsed_args = vars(_parser.parse_args())

        _outputs = hello_world(**_parsed_args)

This change would accomplish two things:

  • Allow non-root users to install pip packages on the fly
  • Allow non-root users to install packages from cache

A side note: I don’t have the historical context of why KFP first tries to install packages as root and on failure as the current user with --user, IMO doing it with --user from the beginning would make more sense. But might be missing something 😃

If you agree with the structure of my proposal, I can work on the change - seems like a pretty small fix.

Thanks!

Materials and Reference

Pipeline definition

import argparse
import kfp
import kubernetes

def hello_world():
    import tqdm
    print("Hello world!")

def hello_world_op():
    return kfp.components.create_component_from_func(func=hello_world, packages_to_install=['tqdm'])()

def pipeline():
    component = hello_world_op()
    user_sc = kubernetes.client.models.V1SecurityContext(run_as_user=1234)
    component.set_security_context(user_sc)

def get_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('--yaml', type=str, required=True)
    return parser.parse_args()

def main():
    args = get_args()
    kfp.compiler.Compiler().compile(
        pipeline_func=pipeline,
        package_path=args.yaml)

if __name__ == "__main__":
    main()
    print(kfp.__version__)

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:7
  • Comments:9 (3 by maintainers)

github_iconTop GitHub Comments

4reactions
skogsbruscommented, Apr 1, 2022

I’ve got a fix for this now, but I’ll need to go through an approval process at work due to the CLA before I can contribute.

1reaction
skogsbruscommented, Sep 23, 2022

Contribution process has been completed. I’ll try to pick this up when I find the time (note: might take a while). In the meantime, feel free to ping me if there’s anything I can clarify

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to install python modules without root access?
In most situations the best solution is to rely on the so-called "user site" location (see the PEP for details) by running: pip...
Read more >
Docs • Svelte
Complete documentation for Svelte.
Read more >
Building Python Function-based Components - Kubeflow
Run the following command to install the Kubeflow Pipelines SDK ... It should not use any code declared outside of the function definition....
Read more >
How To Create Custom Components in React - DigitalOcean
cd tutorial-03-component. Copy. Open the App.js code in a text editor:.
Read more >
Installing Python Dependencies in Dataflow | Google Cloud
In the next sections, we will see a Dockerfile and a Dataflow job run script that pre-install minimal packages in SDK containers and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found