Passing custom parameters to Trainer and Transform
See original GitHub issueA common need from TFX users in my organization (using TFX on KFP) is to have the ability to pass custom parameters to their module files for TFT and Trainer to make iteration and experimentation easier. While the Trainer component has hparams
in the trainer_fn
you use, there doesn’t seem to be support for adding anything outside of the TFX DSL (for example, if you wanted to change num_dnn_layers
as in the public taxi example’s module_file
).
Similarly for Transform, there’s a number of parameters that users would like to adjust as they explore/compare experiment runs, but there is no means to do this, other than to change them in the module_file
, and re-compile the pipeline with the new file each time (as opposed to say, adjusting these parameters through the KFP UI so that you can compare runs, or otherwise do hyper parameter tuning easier).
To support this, we’ve basically created copy-paste versions of the TFX components in our own internal repo, with a little bit of additional code to support passing these parameters as component-level inputs (dict
inputs that are then added to the hparams
object, in the case of transform, adding hparams
to the preprocessing_fn
). This leads to some difficulty with keeping them up to date as TFX releases new versions/fixes.
Which leads me to my question, is this something TFX would incorporate if we were to submit a PR? If not, what is the philosophy around not including such functionality?
Appendix, example code of how it functions:
In your component specification:
from internal_lib import custom_components
...
trainer = custom_components.Trainer(
module_file=taxi_pipeline_utils,
train_files=transform_training.outputs.output,
eval_files=transform_eval.outputs.output,
schema=infer_schema.outputs.output,
tf_transform_dir=transform_training.outputs.output,
train_steps=10000,
eval_steps=5000,
warm_starting=True,
hparams=dict(hidden_layers=6)
)
Later, in the module_file
def trainer_fn(hparams, schema):
# hparams can now reference user defined variables from the input dict
first_dnn_layer_size = 100
num_dnn_layers = hparams.hidden_layers
dnn_decay_factor = 0.7
...
# later, hparams also has all the previously expected values
estimator = _build_estimator(
tf_transform_dir=hparams.tf_transform_dir,
# Construct layers sizes with exponetial decay
hidden_units=[
max(2, int(first_dnn_layer_size * dnn_decay_factor**i))
for i in range(num_dnn_layers)
],
config=run_config,
warm_start_from=hparams.warm_start_from)
...
Similarly, for Transform (only including 3 hparams for demonstration/readability purposes):
transform_training = components.Transform(
input_data=examples_gen.outputs.training_examples,
schema=infer_schema.outputs.output,
module_file=taxi_pipeline_utils,
name='transform-training',
hparams=dict(
VOCAB_FEATURE_KEYS=["payment_type", "company"],
VOCAB_SIZE=1000,
OOV_SIZE=10
)
)
And then used in the transform module_file
:
def preprocessing_fn(inputs, hparams=None):
...
for key in hparams.VOCAB_FEATURE_KEYS:
# Build a vocabulary for this feature.
outputs[transformed_name(key)] = tft.compute_and_apply_vocabulary(
fill_in_missing(inputs[key]),
top_k=hparams.VOCAB_SIZE,
num_oov_buckets=hparams.OOV_SIZE,
)
...
Issue Analytics
- State:
- Created 4 years ago
- Reactions:8
- Comments:21 (15 by maintainers)
It would also be nice to be able to pass
custom_config
in a consistent manner to customFileBasedExampleGen
components.Trainer
can take a dict ascustom_config
whileFileBasedExampleGen
requires anexample_gen_pb2.CustomConfig
object which cannot be easily instantiated.@rclough we’ll update the doc to reflect that is not used specifically by CMLE. Will keep this thread opened as a FR for that.