Update KFP samples to use types that are compatible in v2.
See original GitHub issueKFP SDK v2 has the distinction between Artifacts vs. Parameters. And they are decided by the input/output type annotation. See the doc for more details: https://www.kubeflow.org/docs/components/pipelines/sdk/v2/component-development/#designing-a-pipeline-component
Some of our existing components or samples may use types that are intended to be parameters but would result as artifacts instead when compiling to v2. An example could be the type GCPProjectID
. It is meant to be a string parameter instead of an artifact. We should update our components and samples to change such types to String
instead.
/area samples /area components
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
Introducing Kubeflow Pipelines SDK v2
Kubeflow Pipelines SDK v2 compatibility mode lets you use the new pipeline semantics and gain the benefits of logging your metadata to ML ......
Read more >Kubeflow Pipelines v2 - Go Packages
Read v2 sample test documentation for more details. Update licenses. Note, this is currently outdated instructions for v2 compatible mode. We ...
Read more >kfp.dsl package — Kubeflow Pipelines documentation
Can be used to update the container configurations. Example: import kfp.dsl as dsl from kubernetes.client.models import V1EnvVar ...
Read more >Create, upload, and use a pipeline template | Vertex AI
Specify quickstart-kfp-repo as the repository name. Under Format, select Kubeflow Pipelines . Under Location Type, select Region. In the Region drop- ...
Read more >Scalable ML Workflows using PyTorch on Kubeflow Pipelines ...
Vertex Pipelines requires v2 of the KFP SDK. It is now possible to use the KFP v2 'compatibility mode' to run KFP V2...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’m not denying there’s potential value for custom parameter types. But often they are overuse of types.
GCPProjectID
andGCSPath
can be replaced withString
with little loss of functionality as of today – theopen_schema_validator
forGCSPath
, if I’m not mistaken helps only on value present at compile time but not value produced as runtime; and the GCP Project picker is an interesting idea I didn’t hear before, do you have bug# or doc for this proposal?The issue of keeping the usage of these types is that they would break right now on Vertex AI and KFP v2 compatible mode. And I think it’s a bigger issue that needs to be addressed right now. I’d rather we “downgrade” such types but revert the change in the future than having samples not working out of the box.
Don’t think it’s a good comparison. We aren’t creating a programming language, but a DSL for a specialized application – a pipeline that runs containerized apps. It’s not our goal to reach parity with a common programming language.
Artifact is not just a file, but also the metadata associated with it. Different artifact types have different sets of operations (source). The types do matter.
In comparison, some “types” your mentioned above are just aliases to the string type. For example, when you define a component input with type
Date
– BTW,Date
is neither a defined type in KFP DSL nor a defined type in Python – the component never get adatetime
object or similar, but always a string value whose content is meant to be a date representation. Whether the content is actually a valid date or not doesn’t even matter. And there’s no date-related operations for such a type. Same forCSV
andURL
, they are just some arbitrary names that carry no meaning from our system’s point of view. One component author may write the type asURL
, while others could writeUrl
, and they are viewed as different types, thus incompatible with each other.Since KFP v1 “types” can be arbitrary names, we can’t quantify “half” here.
As said above,
CSV
,Date
,URL
are not real types but meant to be aliases to string type. I tend to agree that using such “types” may improve code readability over usingString
, although in some cases they also seem to be over-use of typing. Allowing arbitrary user provided names as types could be troublesome. With our current DSL syntax, there’s no good way to decide whether an arbitrary name should be a parameter type or an artifact type. I recall you also agreed on 1) arbitrary unknown types should be treated as artifact types by default for maximum compatibility; 2) It’s not a good idea to whitelist some arbitrary names as parameter types.JsonObject
is supported like an alias todict
type, which is treated as a parameter type. User can keep usingJsonObject
as the type. And with the new v2@component
decorator, they can pass a Python dict object instead of a serialized string.XGBoostModel
should really be an artifact type, which is the current behavior – it is treated as a generic artifact. Supporting custom artifact types is on our roadmap, but supporting user defined parameter types is not.I don’t know enough about “Managed Pipeline Runner (May 2020)”, but #5478 does not fix the issue discussed here. This has been discussed several times internally and externally – it does special handling for certain cases but cannot address all cases uniformly. Our team has decided not to move forward this PR.
“Downgrade” is the word you used. I quoted, and argued there’s little lose of functionality as of today. The features you mentioned about project picker etc. are still up in the air, which I don’t even see the proposal let alone the timeline. Again, my philosophy here is that it’s more important to make our samples work today than preserving the incompatible factors with the hope that they could be compatible and useful in the future.
The issue reflect not only my own opinion but the consensus from the team.