Unexpected behaviour given multiple optional inputs in component YAML
See original GitHub issueWhat steps did you take:
When an input param is given as optional: True
in the component YAML, and the Python program uses a CLI library such as argparse or click to accept options, the optional params get passed as flags instead of as omitted arguments as expected.
Given a component.yaml
file such as:
name: myprint
inputs:
- {name: A, optional: true, type: String}
- {name: B, optional: true, type: String}
implementation:
container:
image: gcr.io/.../mycomp
command: [python3, /src/myprint.py]
args: [
--param1, {inputValue: A},
--param2, {inputValue: B},
]
And myprint.py
that was built into the image above:
import argparse, json
def myprint(param_1: str, param_2: str) -> float:
print(json.dumps(locals(), indent=4))
parser = argparse.ArgumentParser()
parser.add_argument('--param1', type=str, default="default1")
parser.add_argument('--param2', type=str, default="default2")
args = parser.parse_args()
myprint(args.param1, args.param2)
What happened:
1) When we omit both arguments in the pipeline (Using the component spec from above)
myprint_op = kfp.components.load_component_from_file("component.yaml")
@dsl.pipeline()
def myprint_pipeline():
first_add_task = myprint_op() # neither input `a` nor `b` are supplied to myprint_op
run = client.create_run_from_pipeline_func(myprint_pipeline, arguments=arguments)
The run is compiled and submitted successfully.
But we see from the output that both params were passed in as flags instead of arguments, and worse, --param1
got passed in as the value of --param1
.
# console output
{
"param_1": "--param2",
"param_2": null
}
By indicating both inputs as optional in the YAML spec, it seemed to have instead passed both arguments in as flags instead of arguments. i.e. equivalent of: python3 /src/myprint.py --param1 --param2
instead of python3 /src/myprint.py
(which would set param1 and param2 argument values to None, allowing argparse to assign default values.)
2. Programs with one optional argument
In cases where our .py
program only has only one optional argument, say --param1
, and described as: {name: param1, optional: true, type: String}
.
If we omit the param in the pipeline similar to before, the run compiles successfully. But during component runtime, we get the error:
Error: --param1 option requires an argument
3. Adding defaults to optional inputs It seems that it’s not possible to set a default value of null for any inputs via the YAML file.
There seems to be some incongruence between how non-null and null default values are treated:
1a. {name: A, optional: false, default: 40}
- Pipeline compiles successfully when A is omitted, assigns 40 to A, as expected.
1b. {name: A, optional: false, default: null}
- Pipeline fails to compile, complaining A is missing; but if it’s in keeping with the previous case, the expected behaviour should be to assign null as the value of A
2a. {name: A, optional: true, default: 40}
- Pipeline compiles successfully when A is omitted, and assigns 40 to A - this seems to be the exact same behaviour as in the case of 1a.?
2b. {name: A, optional: true, default: null}
- Compiles successfully, but passes in --A as a flag; in this case it seems {name: A, optional: true, default: null}
and {name: A, optional: true}
behave in exactly in the same way.
In general, there doesn’t seem to be an agreement as to when default values are assigned when optional is set to true or false. And this is especially the case when we attempt to set default: null
(or default: ~
).
Environment:
How did you deploy Kubeflow Pipelines (KFP)? KFP version: https://github.com/kubeflow/pipelines/commit/743746b96e6efc502c33e1e529f2fe89ce09481c
KFP SDK version:
kfp 1.4.0
kfp-pipeline-spec 0.1.6
kfp-server-api 1.4.1
Anything else you would like to add:
All in all, there seems to be two main ways of resolving this:
- Change in the KFP library: Allow that when
optional: true
, and no default is set, do NOT send a flag to the component program - so that argparse/click within the component program can handle empty arguments properly. - With argparse/click library: Find a way to somehow have these CLI libraries parse an input as both a flag and an option. So that when KFP passes a flag into the component program, the CLI library can interpret the flag as if it’s an option with a None value. Have dug around on this possibility, but this doesn’t seem to be something that these libraries support.
Finally a temporary workaround we’ve been using:
3. I’ve created a wrapper around the .py component, and used the wrapper component as the entrypoint instead to the component in the YAML spec. The wrapper effectively removes any flags it receives, before invoking the actual component. e.g. assuming we omitted param1 and param2, KFP now calls myprint_wrapper.py
with python3 /src/myprint_wrapper.py --param1 --param2
. The wrapper then removes the flags --param1 --param2, before invoking the actual component without those erroneous flags: python3 /src/myprint.py
/kind bug /area sdk
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (4 by maintainers)
Top GitHub Comments
Hey @amyxst, did you check https://github.com/kubeflow/pipelines/blob/master/samples/test/placeholder_if.py?
Currently, KFP doesn’t really support runtime if placeholder, so suggest wrapping your command call in bash or python and add the conditional there
Thank you for the detailed issue report.
The behavior is by design. There is no special/different behavior for multiple optional arguments. What you see is mostly the way you program parses the command-line.
The question is a bit unclear regarding what behavior you are expecting any why.
optional: true
means that the pipeline author may skip passing any argument to the optional input. In current implementation, the input placeholder can be replaced with nothing if no argument was passed, but TBH, an attempt to resolve a placeholder for a missing input is more like an error. We should probably add a warning for that case.Command-line programs can be made to handle such CLI arguments. If your program cannot handle them, you can either change the program or change the specification.
The container component specification describes command-line programs. Command-lines have no concept of “null”. Command line arguments are strings. This is an OS limitation. Additionally, when parsing the YAML we treat
default: null
same as unspecified.You could set default value to an empty string BTW, if that works for you.
Let’s try to understand what KFP does. KFP does not “assign” anything to the variables of your python program. Nor does it quite “pass” things to programs. It does not look into your program code. What KFP does is execute command-line programs after replacing the placeholders. (You can easily check the command-line arguments of your program in the “Pod” tab. You can also debug programs by using the
command: ["bash", "-c", 'echo "$0" "$@"']
). Command-lines do not have concepts like “assigning”, “passing” or “flags”. Command line consists of executable name and an array of arguments (which are null-terminated C strings). When you omit arguments for optional inputs, you get a pretty expected command-line:python3 /src/myprint.py --param1 --param2
. Then you program interprets that the way you’ve observed.An important thing to understand is that the interface is the command-line. An array of strings. Your program must be able to interpret its command-line. If you want your program to receive special values like
None
,-Infinity
,<built-in function>
ordict
, you need to think about how you’re going to represent them on the command-line.Solution
If you want to add/remove parts of command-line based on the presence of argument for an optional input, just use the
if
placeholder:Please check how components generated by
kfp.components.create_component_from_func
look like: https://github.com/kubeflow/pipelines/blob/a80421191db917322ff312626409526b0a76aa68/components/json/Build_list/component.yaml#L78