Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Custom component properties can break Elyra

See original GitHub issue

Describe the issue Reverse engineering the current pipeline file structure for typed pipelines, I’ve noticed that component properties appear to be not scoped in a safe way, which can result in corrupted pipeline files if a component property name is identical to a Elyra/reserved property name.

Excerpt from a file. Note that app_data stores reserved properties and component properties as siblings.

       {
          "id": "5d121620-6873-4aa1-a85f-2a864485e194",
          "type": "execution_node",
          "op": "filter-text-using-shell-and-grep-ptitzler",
          "app_data": {
            "component_source": "/opt/anaconda3/envs/components/lib/python3.7/site-packages/elyra/pipeline/resources/filter-text-local.yaml",
            "runtime_image": "quay.io/ptitzler/alpine",
            "component_source_type": "filename", # <------ reserved property
            "elyra_path_text": "",
            "pattern": "Apache",       # <------ component property
            "ui_data": {

Unless I am missing something we need to store the properties in a separate scope.

Issue Analytics

State:
Created 2 years ago
Comments:5 (5 by maintainers)

Top GitHub Comments

2reactions

ptitzlercommented, Jun 22, 2021

A couple of them, considering that the pipeline file in essence only contains some of the information required to materialize a component (component specification (YAML) + parameters) for processing:

Sooner or later component specifications are going to change, raising the potential for a mismatch between component-specification-v<X+1> and the component-properties-v<X> in the pipeline file. How would we detect mismatches?
As is, a pipeline can only be processed if at runtime the component specification is accessible. Is that an acceptable prerequisite or should the specification be also embedded to make the pipeline file self contained?

2reactions

kevin-batescommented, Jun 21, 2021

I agree with @ptitzler’s assessment. I think introducing a scope/namespace within app_data would be sufficient. I think the only sibling-level attributes at the level would be component_source and component_source_type. All other attributes would be embedded within a component_properties object_valued (dict) attribute.

Using the following two-node pipeline snippet as an example, what today looks like this…

        {
          "id": "ca95fd8a-776b-43ae-a73e-e99906b80935",
          "type": "execution_node",
          "op": "serve-pytorch-model-seldon-core",
          "app_data": {
            "component_source": "https://raw.githubusercontent.com/kubeflow/pipelines/master/components/ibm-components/ffdl/serve/component.yaml",
            "runtime_image": "docker.io/aipipeline/ffdl-serve:latest",
            "component_source_type": "url",
            "model_id": "dddd",
            "deployment_name": "dddd",
            "model_class_name": "ModelClass",
            "model_class_file": "model_class.py",
            "serving_image": "aipipeline/seldon-pytorch:0.1",
            "ui_data": {
              "label": "Serve PyTorch Model - Seldon Core",
              "description": "Serve PyTorch Models remotely as web service using Seldon Core"
            }
          },
          "inputs": [
          ],
          "outputs": [
          ],
        },
        {
          "id": "4fcb7b56-7450-454f-9f4a-be454847c6cf",
          "type": "execution_node",
          "op": "execute-notebook-node",
          "app_data": {
            "label": "",
            "filename": "elyra/my_pipeline/node1.ipynb",
            "runtime_image": "continuumio/anaconda3:2020.07",
            "cpu": "",
            "gpu": "",
            "memory": "",
            "outputs": [],
            "env_vars": [], 
            "dependencies": [],
            "include_subdirectories": false,
            "ui_data": {
              "label": "node1.ipynb",
              "description": "Notebook"
            }
          },
          "inputs": [
          ],
          "outputs": [
          ]
        }

would look something like…

        {
          "id": "ca95fd8a-776b-43ae-a73e-e99906b80935",
          "type": "execution_node",
          "op": "serve-pytorch-model-seldon-core",
          "app_data": {
            "component_source": "https://raw.githubusercontent.com/kubeflow/pipelines/master/components/ibm-components/ffdl/serve/component.yaml",
            "component_source_type": "url",
            "component_properties": {
              "model_id": "dddd",
              "deployment_name": "dddd",
              "model_class_name": "ModelClass",
              "model_class_file": "model_class.py",
              "serving_image": "aipipeline/seldon-pytorch:0.1",
            },
            "ui_data": {
              "label": "Serve PyTorch Model - Seldon Core",
              "description": "Serve PyTorch Models remotely as web service using Seldon Core"
            }
          },
          "inputs": [
          ],
          "outputs": [
          ],
        },
        {
          "id": "4fcb7b56-7450-454f-9f4a-be454847c6cf",
          "type": "execution_node",
          "op": "execute-notebook-node",
          "app_data": {
            "component_source": "elyra",
            "component_source_tyoe": "elyra",
            "component_properties": {
              "label": "",
              "filename": "elyra/my_pipeline/node1.ipynb",
              "runtime_image": "continuumio/anaconda3:2020.07",
              "cpu": "",
              "gpu": "",
              "memory": "",
              "outputs": [],
              "env_vars": [], 
              "dependencies": [],
              "include_subdirectories": false,
            },
            "ui_data": {
              "label": "node1.ipynb",
              "description": "Notebook"
            }
          },
          "inputs": [
          ],
          "outputs": [
          ]
        }

Items worth noting:

The only properties at app_data and siblings to component_properties are component_source and component_source_type. (Note the absence of runtime_image in the “serve-pytorch-model” properties.)
Values of "elyra" (or some well-known placeholder) imply system-defined or built-in components (currently Notebook, Python-script, and R-script). These nodes will always have the complete set of system-defined properties.

Should we determine the need for system-defined properties that correspond to non-elyra components, I think it would be best that those be added outside the component_properties stanza. For example, env_vars may be something we could apply to all component types. If we decided to use those in a general manner, we could move them from the elyra-only component_properties to the “parent scope”. One issue with that however, is that the front-end would need to know which are properties and which are system properties (e.g., “component_source”). So to address that, we should probably introduce an additional “namespace stanza” named something like “generic_properties” or “common_properties” (although the latter conflicts with the canvas terminology).

          "app_data": {
            "component_source": "https://raw.githubusercontent.com/kubeflow/pipelines/master/components/ibm-components/ffdl/serve/component.yaml",
            "component_source_type": "url",
            "component_properties": {
              "model_id": "dddd",
              "deployment_name": "dddd",
              "model_class_name": "ModelClass",
              "model_class_file": "model_class.py",
              "serving_image": "aipipeline/seldon-pytorch:0.1",
            },
            "generic_properties": {
              "env_vars": ["my_env1", "my_env2=default2"]
            },
            "ui_data": {
              "label": "Serve PyTorch Model - Seldon Core",
              "description": "Serve PyTorch Models remotely as web service using Seldon Core"
            }
          },

Thoughts?