question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Update KFP samples to use types that are compatible in v2.

See original GitHub issue

KFP SDK v2 has the distinction between Artifacts vs. Parameters. And they are decided by the input/output type annotation. See the doc for more details: https://www.kubeflow.org/docs/components/pipelines/sdk/v2/component-development/#designing-a-pipeline-component

Some of our existing components or samples may use types that are intended to be parameters but would result as artifacts instead when compiling to v2. An example could be the type GCPProjectID. It is meant to be a string parameter instead of an artifact. We should update our components and samples to change such types to String instead.

/area samples /area components

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

5reactions
chensuncommented, Jun 16, 2021

I’m not denying there’s potential value for custom parameter types. But often they are overuse of types. GCPProjectID and GCSPath can be replaced with String with little loss of functionality as of today – the open_schema_validator for GCSPath, if I’m not mistaken helps only on value present at compile time but not value produced as runtime; and the GCP Project picker is an interesting idea I didn’t hear before, do you have bug# or doc for this proposal?

The issue of keeping the usage of these types is that they would break right now on Vertex AI and KFP v2 compatible mode. And I think it’s a bigger issue that needs to be addressed right now. I’d rather we “downgrade” such types but revert the change in the future than having samples not working out of the box.

3reactions
chensuncommented, Jul 3, 2021

There are many reasons why computer languages introduce support for typing. C++ is not using void* for every pointer even though that would have been possible.

Don’t think it’s a good comparison. We aren’t creating a programming language, but a DSL for a specialized application – a pipeline that runs containerized apps. It’s not our goal to reach parity with a common programming language.

Same with the artifacts. Artifact is just a file (or directory of files), but nobody is proposing to remove all artifact typing and replace everything with Blob or File.

Artifact is not just a file, but also the metadata associated with it. Different artifact types have different sets of operations (source). The types do matter.

In comparison, some “types” your mentioned above are just aliases to the string type. For example, when you define a component input with type Date – BTW, Date is neither a defined type in KFP DSL nor a defined type in Python – the component never get a datetime object or similar, but always a string value whose content is meant to be a date representation. Whether the content is actually a valid date or not doesn’t even matter. And there’s no date-related operations for such a type. Same for CSV and URL, they are just some arbitrary names that carry no meaning from our system’s point of view. One component author may write the type as URL, while others could write Url, and they are viewed as different types, thus incompatible with each other.

But this is exactly what’s happening here. The v2 arbitrary takes half of the KFP types and destroys the type information, forcing them to become untyped strings.

Since KFP v1 “types” can be arbitrary names, we can’t quantify “half” here.

This is especially apparent when you look at MLMD: In KFP v1 all outputs are properly recorded as MLMD artifacts. They’re recorded as CSV, Date, URL, JsonObject, URL, XGBoostModel etc, not String, String, String, String, String.

As said above, CSV, Date, URL are not real types but meant to be aliases to string type. I tend to agree that using such “types” may improve code readability over using String, although in some cases they also seem to be over-use of typing. Allowing arbitrary user provided names as types could be troublesome. With our current DSL syntax, there’s no good way to decide whether an arbitrary name should be a parameter type or an artifact type. I recall you also agreed on 1) arbitrary unknown types should be treated as artifact types by default for maximum compatibility; 2) It’s not a good idea to whitelist some arbitrary names as parameter types. JsonObject is supported like an alias to dict type, which is treated as a parameter type. User can keep using JsonObject as the type. And with the new v2 @component decorator, they can pass a Python dict object instead of a serialized string. XGBoostModel should really be an artifact type, which is the current behavior – it is treated as a generic artifact. Supporting custom artifact types is on our roadmap, but supporting user defined parameter types is not.

The issue of keeping the usage of these types is that they would break right now on Vertex AI and KFP v2 compatible mode.

As demonstrated by the Managed Pipeline Runner (May 2020) and the recent fix PRs (#5478) the architectural limitations of the Cloud IR can be worked around, so that they do not frustrate the users.

I don’t know enough about “Managed Pipeline Runner (May 2020)”, but #5478 does not fix the issue discussed here. This has been discussed several times internally and externally – it does special handling for certain cases but cannot address all cases uniformly. Our team has decided not to move forward this PR.

I’d rather we “downgrade” such types

We cannot really downgrade the user content. Instead of forcing 1000 people to downgrade their code, we might want to upgrade our code instead.

“Downgrade” is the word you used. I quoted, and argued there’s little lose of functionality as of today. The features you mentioned about project picker etc. are still up in the air, which I don’t even see the proposal let alone the timeline. Again, my philosophy here is that it’s more important to make our samples work today than preserving the incompatible factors with the hope that they could be compatible and useful in the future.

The issue reflect not only my own opinion but the consensus from the team.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Introducing Kubeflow Pipelines SDK v2
Kubeflow Pipelines SDK v2 compatibility mode lets you use the new pipeline semantics and gain the benefits of logging your metadata to ML ......
Read more >
Kubeflow Pipelines v2 - Go Packages
Read v2 sample test documentation for more details. Update licenses. Note, this is currently outdated instructions for v2 compatible mode. We ...
Read more >
kfp.dsl package — Kubeflow Pipelines documentation
Can be used to update the container configurations. Example: import kfp.dsl as dsl from kubernetes.client.models import V1EnvVar ...
Read more >
Create, upload, and use a pipeline template | Vertex AI
Specify quickstart-kfp-repo as the repository name. Under Format, select Kubeflow Pipelines . Under Location Type, select Region. In the Region drop- ...
Read more >
Scalable ML Workflows using PyTorch on Kubeflow Pipelines ...
Vertex Pipelines requires v2 of the KFP SDK. It is now possible to use the KFP v2 'compatibility mode' to run KFP V2...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found