question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unexpected behaviour given multiple optional inputs in component YAML

See original GitHub issue

What steps did you take:

When an input param is given as optional: True in the component YAML, and the Python program uses a CLI library such as argparse or click to accept options, the optional params get passed as flags instead of as omitted arguments as expected.

Given a component.yaml file such as:

name: myprint
inputs:
- {name: A, optional: true, type: String}
- {name: B, optional: true, type: String}
implementation:
  container:
    image: gcr.io/.../mycomp
    command: [python3, /src/myprint.py]
    args: [
    --param1, {inputValue: A},
    --param2, {inputValue: B},
    ]

And myprint.py that was built into the image above:

import argparse, json

def myprint(param_1: str, param_2: str) -> float:
    print(json.dumps(locals(), indent=4))

parser = argparse.ArgumentParser()
parser.add_argument('--param1', type=str, default="default1")
parser.add_argument('--param2', type=str, default="default2")
args = parser.parse_args()

myprint(args.param1, args.param2)

What happened:

1) When we omit both arguments in the pipeline (Using the component spec from above)

myprint_op = kfp.components.load_component_from_file("component.yaml")
@dsl.pipeline()
    def myprint_pipeline():
        first_add_task = myprint_op() # neither input `a` nor `b` are supplied to myprint_op

run = client.create_run_from_pipeline_func(myprint_pipeline, arguments=arguments)

The run is compiled and submitted successfully. But we see from the output that both params were passed in as flags instead of arguments, and worse, --param1 got passed in as the value of --param1.

# console output
{
    "param_1": "--param2",
    "param_2": null
} 

By indicating both inputs as optional in the YAML spec, it seemed to have instead passed both arguments in as flags instead of arguments. i.e. equivalent of: python3 /src/myprint.py --param1 --param2 instead of python3 /src/myprint.py (which would set param1 and param2 argument values to None, allowing argparse to assign default values.)

2. Programs with one optional argument In cases where our .py program only has only one optional argument, say --param1, and described as: {name: param1, optional: true, type: String}.

If we omit the param in the pipeline similar to before, the run compiles successfully. But during component runtime, we get the error:

Error: --param1 option requires an argument

3. Adding defaults to optional inputs It seems that it’s not possible to set a default value of null for any inputs via the YAML file.

There seems to be some incongruence between how non-null and null default values are treated:

1a. {name: A, optional: false, default: 40} - Pipeline compiles successfully when A is omitted, assigns 40 to A, as expected. 1b. {name: A, optional: false, default: null} - Pipeline fails to compile, complaining A is missing; but if it’s in keeping with the previous case, the expected behaviour should be to assign null as the value of A

2a. {name: A, optional: true, default: 40} - Pipeline compiles successfully when A is omitted, and assigns 40 to A - this seems to be the exact same behaviour as in the case of 1a.? 2b. {name: A, optional: true, default: null} - Compiles successfully, but passes in --A as a flag; in this case it seems {name: A, optional: true, default: null} and {name: A, optional: true} behave in exactly in the same way.

In general, there doesn’t seem to be an agreement as to when default values are assigned when optional is set to true or false. And this is especially the case when we attempt to set default: null (or default: ~).

Environment:

How did you deploy Kubeflow Pipelines (KFP)? KFP version: https://github.com/kubeflow/pipelines/commit/743746b96e6efc502c33e1e529f2fe89ce09481c

KFP SDK version:

kfp                           1.4.0
kfp-pipeline-spec             0.1.6
kfp-server-api                1.4.1

Anything else you would like to add:

All in all, there seems to be two main ways of resolving this:

  1. Change in the KFP library: Allow that when optional: true, and no default is set, do NOT send a flag to the component program - so that argparse/click within the component program can handle empty arguments properly.
  2. With argparse/click library: Find a way to somehow have these CLI libraries parse an input as both a flag and an option. So that when KFP passes a flag into the component program, the CLI library can interpret the flag as if it’s an option with a None value. Have dug around on this possibility, but this doesn’t seem to be something that these libraries support.

Finally a temporary workaround we’ve been using: 3. I’ve created a wrapper around the .py component, and used the wrapper component as the entrypoint instead to the component in the YAML spec. The wrapper effectively removes any flags it receives, before invoking the actual component. e.g. assuming we omitted param1 and param2, KFP now calls myprint_wrapper.py with python3 /src/myprint_wrapper.py --param1 --param2. The wrapper then removes the flags --param1 --param2, before invoking the actual component without those erroneous flags: python3 /src/myprint.py

/kind bug /area sdk

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
Bobgycommented, Apr 18, 2021

Hey @amyxst, did you check https://github.com/kubeflow/pipelines/blob/master/samples/test/placeholder_if.py?

Currently, KFP doesn’t really support runtime if placeholder, so suggest wrapping your command call in bash or python and add the conditional there

1reaction
Ark-kuncommented, Apr 10, 2021

Thank you for the detailed issue report.

The behavior is by design. There is no special/different behavior for multiple optional arguments. What you see is mostly the way you program parses the command-line.

The question is a bit unclear regarding what behavior you are expecting any why.

optional: true means that the pipeline author may skip passing any argument to the optional input. In current implementation, the input placeholder can be replaced with nothing if no argument was passed, but TBH, an attempt to resolve a placeholder for a missing input is more like an error. We should probably add a warning for that case.

Command-line programs can be made to handle such CLI arguments. If your program cannot handle them, you can either change the program or change the specification.

set a default value of null

The container component specification describes command-line programs. Command-lines have no concept of “null”. Command line arguments are strings. This is an OS limitation. Additionally, when parsing the YAML we treat default: null same as unspecified.

You could set default value to an empty string BTW, if that works for you.

optional params get passed as flags instead of as omitted arguments as expected assigns 40 to A passes in --A as a flag; passed both arguments in as flags instead of arguments

Let’s try to understand what KFP does. KFP does not “assign” anything to the variables of your python program. Nor does it quite “pass” things to programs. It does not look into your program code. What KFP does is execute command-line programs after replacing the placeholders. (You can easily check the command-line arguments of your program in the “Pod” tab. You can also debug programs by using the command: ["bash", "-c", 'echo "$0" "$@"']). Command-lines do not have concepts like “assigning”, “passing” or “flags”. Command line consists of executable name and an array of arguments (which are null-terminated C strings). When you omit arguments for optional inputs, you get a pretty expected command-line: python3 /src/myprint.py --param1 --param2. Then you program interprets that the way you’ve observed.

An important thing to understand is that the interface is the command-line. An array of strings. Your program must be able to interpret its command-line. If you want your program to receive special values like None, -Infinity , <built-in function> or dict, you need to think about how you’re going to represent them on the command-line.

Solution

If you want to add/remove parts of command-line based on the presence of argument for an optional input, just use the if placeholder:

name: myprint
inputs:
- {name: A, optional: true, type: String}
- {name: B, optional: true, type: String}
implementation:
  container:
    image: gcr.io/.../mycomp
    command: [python3, /src/myprint.py]
    args:
    - if:
        cond: {isPresent: A}
        then: [--param1, {inputValue: A}]
    - if:
        cond: {isPresent: B}
        then: [--param2, {inputValue: B}]

Please check how components generated by kfp.components.create_component_from_func look like: https://github.com/kubeflow/pipelines/blob/a80421191db917322ff312626409526b0a76aa68/components/json/Build_list/component.yaml#L78

Read more comments on GitHub >

github_iconTop Results From Across the Web

"Optional" should not be used for parameters
With an Optional parameter, you go from having 2 possible inputs: null and not-null, to three: null, non-null-without-value, and non-null-with-value.
Read more >
Optional arguments in object variable type definition #19898
I like the object variable type and it would be nice to be able to define optional arguments which can carry null value...
Read more >
Can Flask have optional URL parameters?
Using flask 0.10.1 here and I can add multiple routes to one endpoint just fine. – jaapz. Aug 4, 2014 at 9:55.
Read more >
Describing Parameters
By default, OpenAPI treats all request parameters as optional. You can add required: true to mark a parameter as required. Note that path...
Read more >
Why I “hate” optional parameters in C# - ...
Optional arguments, also known as default parameters (the most ... away the default value so it is provided by another component or service....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found