Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[backend] OutputPath is giving "permission denied" - why?

See original GitHub issue

When I run a step that runs fine with Kubeflow v1 compatibility, it fails with the following error:

time="2022-04-26T21:53:30.710Z" level=info msg="capturing logs" argo=true
I0426 21:53:30.745547      18 launcher.go:144] PipelineRoot defaults to "minio://mlpipeline/v2/artifacts".
I0426 21:53:30.745908      18 cache.go:120] Connecting to cache endpoint 10.100.244.104:8887
I0426 21:53:30.854201      18 launcher.go:193] enable caching
F0426 21:53:30.979055      18 main.go:50] Failed to execute component: failed to create directory "/tmp/outputs/output_context_path" for output parameter "output_context_path": mkdir /tmp/outputs/output_context_path: permission denied
time="2022-04-26T21:53:30.980Z" level=info msg="/tmp/outputs/output_context_path/data -> /var/run/argo/outputs/artifacts/tmp/outputs/output_context_path/data.tgz" argo=true
time="2022-04-26T21:53:30.981Z" level=info msg="Taring /tmp/outputs/output_context_path/data"
Error: failed to tarball the output /tmp/outputs/output_context_path/data to /var/run/argo/outputs/artifacts/tmp/outputs/output_context_path/data.tgz: stat /tmp/outputs/output_context_path/data: permission denied
failed to tarball the output /tmp/outputs/output_context_path/data to /var/run/argo/outputs/artifacts/tmp/outputs/output_context_path/data.tgz: stat /tmp/outputs/output_context_path/data: permission denied

The code that produces this is here:

import kfp
from kfp.v2.dsl import component, Artifact, Input, InputPath, Output, OutputPath, Dataset, Model
from typing import NamedTuple

def same_step_000_afc67b36914c4108b47e8b4bb316869d_fn(
    input_context_path: InputPath(str),
    output_context_path: OutputPath(str),
    run_info: str ="gAR9lC4=",
    metadata_url: str="",
):
    from base64 import urlsafe_b64encode, urlsafe_b64decode
    from pathlib import Path
    import datetime
    import requests
    import tempfile
    import dill
    import os

    input_context = None
    with Path(input_context_path).open("rb") as reader:
        input_context = reader.read()

    # Helper function for posting metadata to mlflow.
    def post_metadata(json):
        if metadata_url == "":
            return

        try:
            req = requests.post(metadata_url, json=json)
            req.raise_for_status()
        except requests.exceptions.HTTPError as err:
            print(f"Error posting metadata: {err}")

    # Move to writable directory as user might want to do file IO.
    # TODO: won't persist across steps, might need support in SDK?
    os.chdir(tempfile.mkdtemp())

    # Load information about the current experiment run:
    run_info = dill.loads(urlsafe_b64decode(run_info))

    # Post session context to mlflow.
    if len(input_context) > 0:
            input_context_str = urlsafe_b64encode(input_context)
            post_metadata({
                "experiment_id": run_info["experiment_id"],
                "run_id": run_info["run_id"],
                "step_id": "same_step_000",
                "metadata_type": "input",
                "metadata_value": input_context_str,
                "metadata_time": datetime.datetime.now().isoformat(),
            })

    # User code for step, which we run in its own execution frame.
    user_code = f"""
import dill

# Load session context into global namespace:
if { len(input_context) } > 0:
    dill.load_session("{ input_context_path }")

{dill.loads(urlsafe_b64decode("gASVGAAAAAAAAACMFHByaW50KCJIZWxsbyB3b3JsZCIplC4="))}

# Remove anything from the global namespace that cannot be serialised.
# TODO: this will include things like pandas dataframes, needs sdk support?
_bad_keys = []
_all_keys = list(globals().keys())
for k in _all_keys:
    try:
        dill.dumps(globals()[k])
    except TypeError:
        _bad_keys.append(k)

for k in _bad_keys:
    del globals()[k]

# Save new session context to disk for the next component:
dill.dump_session("{output_context_path}")
"""

    # Runs the user code in a new execution frame. Context from the previous
    # component in the run is loaded into the session dynamically, and we run
    # with a single globals() namespace to simulate top-level execution.
    exec(user_code, globals(), globals())

    # Post new session context to mlflow:
    with Path(output_context_path).open("rb") as reader:
        context = urlsafe_b64encode(reader.read())
        post_metadata({
            "experiment_id": run_info["experiment_id"],
            "run_id": run_info["run_id"],
            "step_id": "same_step_000",
            "metadata_type": "output",
            "metadata_value": context,
            "metadata_time": datetime.datetime.now().isoformat(),
        })

Environment

How did you deploy Kubeflow Pipelines (KFP)? From manifests
KFP version: 1.8.1
KFP SDK version: 1.8.12

Expected result

Is there supposed to be a different way of writing this file? Do I need to mount a storage location in order to do this? (You didn’t have to in KFP v2)

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

Issue Analytics

State:
Created a year ago
Reactions:4
Comments:5

Top GitHub Comments

1reaction

aronchickcommented, Apr 29, 2022

FYI, this is a pure regression from KFP 1.

Here’s two gists - one compiled in LEGACY mode and one compiled in V2 compat.

using the compiler in v1 mode - https://gist.github.com/aronchick/0dfc57d2a794c1bd4fb9bff9962cfbd6 using the compiler in v2 mode - https://gist.github.com/aronchick/473060503ae189b360fbded04d802c80

First one executes fine. Second one gives:

time="2022-04-26T21:53:30.710Z" level=info msg="capturing logs" argo=true
I0426 21:53:30.745547      18 launcher.go:144] PipelineRoot defaults to "minio://mlpipeline/v2/artifacts".
I0426 21:53:30.745908      18 cache.go:120] Connecting to cache endpoint 10.100.244.104:8887
I0426 21:53:30.854201      18 launcher.go:193] enable caching
F0426 21:53:30.979055      18 main.go:50] Failed to execute component: failed to create directory "/tmp/outputs/output_context_path" for output parameter "output_context_path": mkdir /tmp/outputs/output_context_path: permission denied
time="2022-04-26T21:53:30.980Z" level=info msg="/tmp/outputs/output_context_path/data -> /var/run/argo/outputs/artifacts/tmp/outputs/output_context_path/data.tgz" argo=true
time="2022-04-26T21:53:30.981Z" level=info msg="Taring /tmp/outputs/output_context_path/data"
Error: failed to tarball the output /tmp/outputs/output_context_path/data to /var/run/argo/outputs/artifacts/tmp/outputs/output_context_path/data.tgz: stat /tmp/outputs/output_context_path/data: permission denied
failed to tarball the output /tmp/outputs/output_context_path/data to /var/run/argo/outputs/artifacts/tmp/outputs/output_context_path/data.tgz: stat /tmp/outputs/output_context_path/data: permission denied

0reactions

asiricmlcommented, Dec 21, 2022

This issue is affecting us as well. Is there any solution / workaround ? Thank You