question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Python func component renames function arguments

See original GitHub issue

What steps did you take:

Converting a python func to compoent:

import kfp

def hydrate_schema(
    synced_local_path: kfp.components.InputPath(str), data_schema: str
) -> str:
    import re

    if not synced_local_path.endswith("/"):
        synced_local_path += "/"

    return re.sub(r"s3:.+\/", synced_local_path, data_schema)


hydrate_schema_op = kfp.components.create_component_from_func(hydrate_schema, output_component_file="replace_schema.yaml")

In the generated component yaml

name: Hydrate schema
inputs:
- {name: synced_local, type: String}
- {name: data_schema, type: String}
outputs:
- {name: Output, type: String}
implementation:
  container:
    image: python:3.7
    command:
    - sh
    - -ec
    - |
      program_path=$(mktemp)
      echo -n "$0" > "$program_path"
      python3 -u "$program_path" "$@"
    - |
      def hydrate_schema(
          synced_local_path, data_schema
      ):
          import re

          if not synced_local_path.endswith("/"):
              synced_local_path += "/"

          return re.sub(r"s3:.+\/", synced_local_path, data_schema)

      def _serialize_str(str_value: str) -> str:
          if not isinstance(str_value, str):
              raise TypeError('Value "{}" has type "{}" instead of str.'.format(str(str_value), str(type(str_value))))
          return str_value

      import argparse
      _parser = argparse.ArgumentParser(prog='Hydrate schema', description='')
      _parser.add_argument("--synced-local", dest="synced_local_path", type=str, required=True, default=argparse.SUPPRESS)
      _parser.add_argument("--data-schema", dest="data_schema", type=str, required=True, default=argparse.SUPPRESS)
      _parser.add_argument("----output-paths", dest="_output_paths", type=str, nargs=1)
      _parsed_args = vars(_parser.parse_args())
      _output_files = _parsed_args.pop("_output_paths", [])

      _outputs = hydrate_schema(**_parsed_args)

      _outputs = [_outputs]

      _output_serializers = [
          _serialize_str,

      ]

      import os
      for idx, output_file in enumerate(_output_files):
          try:
              os.makedirs(os.path.dirname(output_file))
          except OSError:
              pass
          with open(output_file, 'w') as f:
              f.write(_output_serializers[idx](_outputs[idx]))
    args:
    - --synced-local
    - {inputPath: synced_local}
    - --data-schema
    - {inputValue: data_schema}
    - '----output-paths'
    - {outputPath: Output}

The argument synced_local_path is replaced with synced_local. Therefore, when using the component as

  local_data_schema = hydrate_schema_op(
      data_schema=data_schema, synced_local_path=data_sync.output
  )

Compiler complains that

TypeError: Hydrate schema() got an unexpected keyword argument 'synced_local_path'

What happened:

The python func argument synced_local_path is renamed to synced_local

What did you expect to happen:

The argument names are preserved after conversion.

Environment:

How did you deploy Kubeflow Pipelines (KFP)?

Local cluster deployment using Kind

KFP version: 1.2.0

KFP SDK version: 1.2.0

Anything else you would like to add:

[Miscellaneous information that will assist in solving the issue.]

/kind bug

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
Ark-kuncommented, Apr 10, 2021

There seems to be an error in your function definition. I’m not fully sure what is the behavior you’ve intended, but you probably should not be using the InputPath annotation given your component function code: Just use def hydrate_schema(synced_local_path: str. Or better maybe even call it synced_uri, since it seems to be a URI not local path.

Can you try this solution and tell us whether it has fixed your problem?

P.S. If you tried to call your component, you’d have noticed that synced_local_path does not really receive what you expected. It would have been a real local path (/tmp/inputs/synced_local/data) containing whatever data the component received from the upstream (probably a URI). This is what InputPath does.

Long explanation:

The behavior is: When using create_component_from_func/func_to_container_op: When a function parameter uses InputPath or OutputPath annotation and the parameter name ends with _path or _file, that part is stripped when generating the input/output name.

Let me try to explain why this design was chosen.

When you use create_component_from_func, there are two separate architecture layers: component layer and python function layer. On the pipeline level, the author passes artifacts between components. The pipeline author does not manually pass URIs or local paths. Instead they just connect outputs to inputs. However you function has slightly different interface and gets some data from local files, a concept not existing for the pipeline authors. create_component_from_func generates glue command-line program code to bridge between those layers. Annotations like InputPath and OutputPath influence the way that bridge is constructed.

InputPath means “write the passed artifact contents to a local file and give me path to that file instead of the content itself”. When the component receives a “Dataset” (big text file in CSV format), you function receives a “Dataset path” (a small string with local path). These are very different kinds of objects, so it’s natural that the names are different.

This difference becomes especially apparent if you consider numbers: Notice how the function expects a string path, but the component input has type Integer Function:

def consume_num(number_path: InputPath(int)):
    open(number_path) as f:
        number = f.read()

Component:

inputs:
- {name: Number, type: Integer}
....
implementation:
  container:
    args:
    - --number-path, {inputPath: Number}

Pipeline:

def my_pipeline():
    consume_num_op(number=42)

Observe the flow:

  1. The pipeline author passes value 42 to the input Number using number=42
  2. The component specification says that the value for the Number input needs to be written to a local file (/tmp/outputs/Number/data)
  3. The program receives the path as an argument: --number-path /tmp/outputs/Number/data
  4. The path is parsed from the command-line and passed to the function: number_path="/tmp/outputs/Number/data"
  5. The function code uses the path to read the number from the file

If the create_component_from_func did not strip _path when naming the inputs, this would look wrong and weird for the pipeline author:

def my_pipeline():
    consume_num_op(number_path=42)

number_path=42 looks wrong since 42 is not a valid path - it’s an integer.

0reactions
stale[bot]commented, Apr 18, 2022

This issue has been automatically closed because it has not had recent activity. Please comment “/reopen” to reopen it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Python - Rename function returned by other function
Save this question. Show activity on this post. I created a method create_function which returns another function with modified behaviour based ...
Read more >
Renaming Python's Functions — AP CSP - Teacher
Renaming Python's Functions ¶. The functions abs and int are names. They are variables whose values are a set of statements that achieve...
Read more >
How can I rename a function? - Python FAQ
Given the def word defines a function, yes, rename is a function. The only way to rename a function is to change the...
Read more >
How to Rename Files in Python with os.rename() - Datagy
rename () function accepts two required arguments: the original, source file path and the destination file path. Enter the source file path into ......
Read more >
os — Miscellaneous operating system interfaces — Python ...
All functions accepting path or file names accept both bytes and string objects, and result in an object of the same type, if...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found