Latest Beam version causes error under TFX/TFT 0.15
See original GitHub issueWe have a few frequently run pipelines, and we noticed a recent failure of our pipelines in the Transform component. The pipeline worked fine until the latest execution (we re-created the underlying container, and we currently execute the pipeline in the interactive mode before we export it to KFP). No code changes, nor dependency updates on our side.
However, we notice that a new Apache Beam version got released a few days ago: 2.19 and this seems to be the culprit for our pipeline errors. When we downgrade back to Apache Beam 2.18 and PyArrow back to 0.14, everything works again.
Checking the tfx dependencies, we noticed that 2.19 is allowed in combination with tfx==0.15 and therefore installed in newly created pipelines.
Currently, TFT stops with this warning and error message:
WARNING:apache_beam.utils.interactive_utils:Failed to alter the label of a transform with the ipython prompt metadata. Cannot figure out the pipeline that the given pvalueish {'_schema': feature {
name: "text"
type: BYTES
presence {
min_fraction: 1.0
min_count: 1
}
shape {
dim {
size: 1
}
}
}
feature {
name: "label"
type: INT
bool_domain {
}
presence {
min_fraction: 1.0
min_count: 1
}
shape {
dim {
size: 1
}
}
}
} belongs to. Thus noop.
...
TypeError: not all arguments converted during string formatting
Again, the pipeline works fine if we downgrade Apache Beam to 2.18 and PyArrow to 0.14. Maybe the dependencies need to be more restrictive. Currently, they allow the 2.19 version
'apache-beam[gcp]>=2.18,<3',
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (2 by maintainers)
If you are using tfx==0.15 then TFX supports apache-beam versions >=2.16 and ❤️ as shown here
'apache-beam[gcp]>=2.16,<3'
but if you are using tfx==0.21, then TFX supports apache-beam versions >=2.17 and <2.18 as shown here
'apache-beam[gcp]>=2.17,<2.18'
I think you are using TFX 2.1. Please confirm @hanneshapke. Thanks!
Closing this issue since version 0.15 is out-dated by now.