Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Passing custom parameters to Trainer and Transform

See original GitHub issue

A common need from TFX users in my organization (using TFX on KFP) is to have the ability to pass custom parameters to their module files for TFT and Trainer to make iteration and experimentation easier. While the Trainer component has hparams in the trainer_fn you use, there doesn’t seem to be support for adding anything outside of the TFX DSL (for example, if you wanted to change num_dnn_layers as in the public taxi example’s module_file).

Similarly for Transform, there’s a number of parameters that users would like to adjust as they explore/compare experiment runs, but there is no means to do this, other than to change them in the module_file, and re-compile the pipeline with the new file each time (as opposed to say, adjusting these parameters through the KFP UI so that you can compare runs, or otherwise do hyper parameter tuning easier).

To support this, we’ve basically created copy-paste versions of the TFX components in our own internal repo, with a little bit of additional code to support passing these parameters as component-level inputs (dict inputs that are then added to the hparams object, in the case of transform, adding hparams to the preprocessing_fn). This leads to some difficulty with keeping them up to date as TFX releases new versions/fixes.

Which leads me to my question, is this something TFX would incorporate if we were to submit a PR? If not, what is the philosophy around not including such functionality?

Appendix, example code of how it functions:

In your component specification:

from internal_lib import custom_components

...

trainer = custom_components.Trainer(
      module_file=taxi_pipeline_utils,
      train_files=transform_training.outputs.output,
      eval_files=transform_eval.outputs.output,
      schema=infer_schema.outputs.output,
      tf_transform_dir=transform_training.outputs.output,
      train_steps=10000,
      eval_steps=5000,
      warm_starting=True,
      hparams=dict(hidden_layers=6)
      )

Later, in the module_file

def trainer_fn(hparams, schema):
    # hparams can now reference user defined variables from the input dict
    first_dnn_layer_size = 100
    num_dnn_layers = hparams.hidden_layers
    dnn_decay_factor = 0.7

...

    # later, hparams also has all the previously expected values
    estimator = _build_estimator(
      tf_transform_dir=hparams.tf_transform_dir,

      # Construct layers sizes with exponetial decay
      hidden_units=[
          max(2, int(first_dnn_layer_size * dnn_decay_factor**i))
          for i in range(num_dnn_layers)
      ],
      config=run_config,
      warm_start_from=hparams.warm_start_from)

...

Similarly, for Transform (only including 3 hparams for demonstration/readability purposes):

transform_training = components.Transform(
    input_data=examples_gen.outputs.training_examples,
    schema=infer_schema.outputs.output,
    module_file=taxi_pipeline_utils,
    name='transform-training',
    hparams=dict(
        VOCAB_FEATURE_KEYS=["payment_type", "company"], 
        VOCAB_SIZE=1000,
        OOV_SIZE=10
    )
)

And then used in the transform module_file:

def preprocessing_fn(inputs, hparams=None):

...

    for key in hparams.VOCAB_FEATURE_KEYS:
        # Build a vocabulary for this feature.
        outputs[transformed_name(key)] = tft.compute_and_apply_vocabulary(
            fill_in_missing(inputs[key]),
            top_k=hparams.VOCAB_SIZE,
            num_oov_buckets=hparams.OOV_SIZE,
        )

...

Issue Analytics

State:
Created 4 years ago
Reactions:8
Comments:21 (15 by maintainers)

Top GitHub Comments

2reactions

ialdencootscommented, Oct 5, 2020

It would also be nice to be able to pass custom_config in a consistent manner to custom FileBasedExampleGen components. Trainer can take a dict as custom_config while FileBasedExampleGen requires an example_gen_pb2.CustomConfig object which cannot be easily instantiated.

2reactions

ruoyu90commented, Oct 3, 2019

@rclough we’ll update the doc to reflect that is not used specifically by CMLE. Will keep this thread opened as a FR for that.

Top Results From Across the Web

Trainer - Hugging Face

Before instantiating your Trainer, create a TrainingArguments to access all the points of customization during training. The API supports distributed training ...

How to set parameters for a custom PySpark Transformer ...

Using PipelineModel.transform() I can't see how to target the parameters supplied in the params argument to my stage in the pipeline. I can' ......

Using custom functions in Power Query - Microsoft Learn

This article outlines how to create a custom function with Power Query using common transforms accessible in the Power Query user interface.

Pipelines & Custom Transformers in scikit-learn: The step-by ...

Custom target transformation via TransformedTargetRegressor. Chaining everything together in a ... Passing arguments to the constructor.

PowerQuery Parameters and Custom Functions

Before anything else, we need to create a parameter for our file path. In the Home tab, click Manage Parameters > New Parameter....