question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Custom component properties can break Elyra

See original GitHub issue

Describe the issue Reverse engineering the current pipeline file structure for typed pipelines, I’ve noticed that component properties appear to be not scoped in a safe way, which can result in corrupted pipeline files if a component property name is identical to a Elyra/reserved property name.

Excerpt from a file. Note that app_data stores reserved properties and component properties as siblings.

       {
          "id": "5d121620-6873-4aa1-a85f-2a864485e194",
          "type": "execution_node",
          "op": "filter-text-using-shell-and-grep-ptitzler",
          "app_data": {
            "component_source": "/opt/anaconda3/envs/components/lib/python3.7/site-packages/elyra/pipeline/resources/filter-text-local.yaml",
            "runtime_image": "quay.io/ptitzler/alpine",
            "component_source_type": "filename", # <------ reserved property
            "elyra_path_text": "",
            "pattern": "Apache",       # <------ component property
            "ui_data": {

Unless I am missing something we need to store the properties in a separate scope.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
ptitzlercommented, Jun 22, 2021

A couple of them, considering that the pipeline file in essence only contains some of the information required to materialize a component (component specification (YAML) + parameters) for processing:

  • Sooner or later component specifications are going to change, raising the potential for a mismatch between component-specification-v<X+1> and the component-properties-v<X> in the pipeline file. How would we detect mismatches?
  • As is, a pipeline can only be processed if at runtime the component specification is accessible. Is that an acceptable prerequisite or should the specification be also embedded to make the pipeline file self contained?
2reactions
kevin-batescommented, Jun 21, 2021

I agree with @ptitzler’s assessment. I think introducing a scope/namespace within app_data would be sufficient. I think the only sibling-level attributes at the level would be component_source and component_source_type. All other attributes would be embedded within a component_properties object_valued (dict) attribute.

Using the following two-node pipeline snippet as an example, what today looks like this…

        {
          "id": "ca95fd8a-776b-43ae-a73e-e99906b80935",
          "type": "execution_node",
          "op": "serve-pytorch-model-seldon-core",
          "app_data": {
            "component_source": "https://raw.githubusercontent.com/kubeflow/pipelines/master/components/ibm-components/ffdl/serve/component.yaml",
            "runtime_image": "docker.io/aipipeline/ffdl-serve:latest",
            "component_source_type": "url",
            "model_id": "dddd",
            "deployment_name": "dddd",
            "model_class_name": "ModelClass",
            "model_class_file": "model_class.py",
            "serving_image": "aipipeline/seldon-pytorch:0.1",
            "ui_data": {
              "label": "Serve PyTorch Model - Seldon Core",
              "description": "Serve PyTorch Models remotely as web service using Seldon Core"
            }
          },
          "inputs": [
          ],
          "outputs": [
          ],
        },
        {
          "id": "4fcb7b56-7450-454f-9f4a-be454847c6cf",
          "type": "execution_node",
          "op": "execute-notebook-node",
          "app_data": {
            "label": "",
            "filename": "elyra/my_pipeline/node1.ipynb",
            "runtime_image": "continuumio/anaconda3:2020.07",
            "cpu": "",
            "gpu": "",
            "memory": "",
            "outputs": [],
            "env_vars": [], 
            "dependencies": [],
            "include_subdirectories": false,
            "ui_data": {
              "label": "node1.ipynb",
              "description": "Notebook"
            }
          },
          "inputs": [
          ],
          "outputs": [
          ]
        }

would look something like…

        {
          "id": "ca95fd8a-776b-43ae-a73e-e99906b80935",
          "type": "execution_node",
          "op": "serve-pytorch-model-seldon-core",
          "app_data": {
            "component_source": "https://raw.githubusercontent.com/kubeflow/pipelines/master/components/ibm-components/ffdl/serve/component.yaml",
            "component_source_type": "url",
            "component_properties": {
              "model_id": "dddd",
              "deployment_name": "dddd",
              "model_class_name": "ModelClass",
              "model_class_file": "model_class.py",
              "serving_image": "aipipeline/seldon-pytorch:0.1",
            },
            "ui_data": {
              "label": "Serve PyTorch Model - Seldon Core",
              "description": "Serve PyTorch Models remotely as web service using Seldon Core"
            }
          },
          "inputs": [
          ],
          "outputs": [
          ],
        },
        {
          "id": "4fcb7b56-7450-454f-9f4a-be454847c6cf",
          "type": "execution_node",
          "op": "execute-notebook-node",
          "app_data": {
            "component_source": "elyra",
            "component_source_tyoe": "elyra",
            "component_properties": {
              "label": "",
              "filename": "elyra/my_pipeline/node1.ipynb",
              "runtime_image": "continuumio/anaconda3:2020.07",
              "cpu": "",
              "gpu": "",
              "memory": "",
              "outputs": [],
              "env_vars": [], 
              "dependencies": [],
              "include_subdirectories": false,
            },
            "ui_data": {
              "label": "node1.ipynb",
              "description": "Notebook"
            }
          },
          "inputs": [
          ],
          "outputs": [
          ]
        }

Items worth noting:

  • The only properties at app_data and siblings to component_properties are component_source and component_source_type. (Note the absence of runtime_image in the “serve-pytorch-model” properties.)
  • Values of "elyra" (or some well-known placeholder) imply system-defined or built-in components (currently Notebook, Python-script, and R-script). These nodes will always have the complete set of system-defined properties.

Should we determine the need for system-defined properties that correspond to non-elyra components, I think it would be best that those be added outside the component_properties stanza. For example, env_vars may be something we could apply to all component types. If we decided to use those in a general manner, we could move them from the elyra-only component_properties to the “parent scope”. One issue with that however, is that the front-end would need to know which are properties and which are system properties (e.g., “component_source”). So to address that, we should probably introduce an additional “namespace stanza” named something like “generic_properties” or “common_properties” (although the latter conflicts with the canvas terminology).

          "app_data": {
            "component_source": "https://raw.githubusercontent.com/kubeflow/pipelines/master/components/ibm-components/ffdl/serve/component.yaml",
            "component_source_type": "url",
            "component_properties": {
              "model_id": "dddd",
              "deployment_name": "dddd",
              "model_class_name": "ModelClass",
              "model_class_file": "model_class.py",
              "serving_image": "aipipeline/seldon-pytorch:0.1",
            },
            "generic_properties": {
              "env_vars": ["my_env1", "my_env2=default2"]
            },
            "ui_data": {
              "label": "Serve PyTorch Model - Seldon Core",
              "description": "Serve PyTorch Models remotely as web service using Seldon Core"
            }
          },

Thoughts?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pipeline components — Elyra 3.14.0.dev0 documentation
Managing custom components using the Elyra CLI​​ Custom components can be added, modified, and removed using the elyra-metadata command line interface.
Read more >
Elyra 3.3: Pipelines, custom components, and catalogs - Medium
In this blog post we'll review how you can re-use existing Kubeflow Pipelines components and Apache Airflow operators in Elyra pipelines.
Read more >
Elyra | Kubeflow
Elyra enables data scientists to visually create end-to-end ... Elyra, you can identify the components/tasks and related properties that are ...
Read more >
Elyra's Jupyter AI Pipelines Now Support Custom Components
You can find content about Elyra's Jupyter AI Pipelines Now Support Custom ... Breaking down and modularizing a pipeline is harder.
Read more >
ELYRIA & SWANSEA NEIGHBORHOODS PLAN 2015
neighborhood plan is a significant accomplishment that will build on this history ... realm as a defining element of the Elyria and Swansea....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found