Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to write custom component

See original GitHub issue

Hello,

I am trying to figure out how to write my own component but I am struggeling to understand all the abstraction concepts like Channel, ChannelParameter, ExecutionParameter, Artifact etc.

Is there any documentation on this?

In the end I just want to excute my own python function inside of TFX, basically overwritting the do function of the Executor but I cannot get the input/outputs to work.

Here is my very fist attempt:

from tfx.components.base.base_component import BaseComponent, ComponentSpec, ExecutionParameter, ChannelParameter
from tfx.components.base.base_executor import BaseExecutor
from tfx.components.transform.component import Transform
from tfx.components.pusher.component import Pusher
from typing import Any, Dict, List, Optional, Text
from tfx.utils import types
from tfx.utils import channel


class MyCustomComponentSpec(ComponentSpec):
    COMPONENT_NAME = 'MyCustomComponent'
    PARAMETERS = {}
    INPUTS = {
        'input_examples': ChannelParameter(type_name='ExamplesPath'),
    }
    OUTPUTS = {
        'output_examples': ChannelParameter(type_name='ExamplesPath'),
    }


class MyCustomExecutor(BaseExecutor):

    def Do(self, input_dict: Dict[Text, List[types.TfxArtifact]],
           output_dict: Dict[Text, List[types.TfxArtifact]],
           exec_properties: Dict[Text, Any]) -> None:
        print(input_dict)
        print(output_dict)
        print(exec_properties)


class MyCustomComponent(BaseComponent):
    SPEC_CLASS = MyCustomComponentSpec
    EXECUTOR_CLASS = MyCustomExecutor

    def __init__(self,
                 input_examples: str,
                 output_examples: Optional[channel.Channel] = None,
                 name: Optional[Text] = None):

        input_artifact = types.TfxArtifact('ExamplesPath')
        input_artifact.uri = input_examples

        input_channel = channel.Channel(
                        type_name='ExamplesPath',
                        artifacts=[input_artifact])

        output_channel = channel.Channel(
            type_name='ExamplesPath',
            artifacts=[types.TfxArtifact('ExamplesPath')])

        spec = MyCustomComponentSpec(
            input_examples=channel.as_channel(input_channel),
            output_examples=output_channel)
        super(MyCustomComponent, self).__init__(spec=spec,
                                                name=name)

Issue Analytics

State:
Created 4 years ago
Comments:24 (23 by maintainers)

Top GitHub Comments

3reactions

ruoyu90commented, Jul 11, 2019

@rummens hopefully you’ll find the following explanation useful 😃

In TFX, we model an execution into several parts:

Artifacts: inputs / outputs of an execution. The artifacts are things that are produced by upstream components and consumed by downstream components. e.g. an example file, a model, etc
Execution properties: other parameters that are used by an execution. The impact of execution properties stay within the execution and is used to describe / distinguish an execution.
Execution: An execution takes input artifacts and process them based on potential execution properties and produces output artifacts.

By using this data model, we enable lineage tracking of artifacts and executions which will further enable more advanced features like continuous training (under developing). And as you can tell, inputs / outputs are quite different from other parameters in the execution.

1reaction

robertluggcommented, Aug 15, 2019

I see you can use the convenience function in the channel module:

output_channel = channel.as_channel([output_instance])

, and that’s fine.

I also believe we may be using ‘upstream’ and ‘downstream’ differently. I consider a component which is ‘downstream’ to be the one which consumes some output from its ‘upstream’ “node”. I believe this is the more common convention.