question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Latest Beam version causes error under TFX/TFT 0.15

See original GitHub issue

We have a few frequently run pipelines, and we noticed a recent failure of our pipelines in the Transform component. The pipeline worked fine until the latest execution (we re-created the underlying container, and we currently execute the pipeline in the interactive mode before we export it to KFP). No code changes, nor dependency updates on our side.

However, we notice that a new Apache Beam version got released a few days ago: 2.19 and this seems to be the culprit for our pipeline errors. When we downgrade back to Apache Beam 2.18 and PyArrow back to 0.14, everything works again.

Checking the tfx dependencies, we noticed that 2.19 is allowed in combination with tfx==0.15 and therefore installed in newly created pipelines.

Currently, TFT stops with this warning and error message:

WARNING:apache_beam.utils.interactive_utils:Failed to alter the label of a transform with the ipython prompt metadata. Cannot figure out the pipeline that the given pvalueish {'_schema': feature {
  name: "text"
  type: BYTES
  presence {
    min_fraction: 1.0
    min_count: 1
  }
  shape {
    dim {
      size: 1
    }
  }
}
feature {
  name: "label"
  type: INT
  bool_domain {
  }
  presence {
    min_fraction: 1.0
    min_count: 1
  }
  shape {
    dim {
      size: 1
    }
  }
}
} belongs to. Thus noop.
...
TypeError: not all arguments converted during string formatting

Again, the pipeline works fine if we downgrade Apache Beam to 2.18 and PyArrow to 0.14. Maybe the dependencies need to be more restrictive. Currently, they allow the 2.19 version

'apache-beam[gcp]>=2.18,<3',

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

3reactions
gowthamkprcommented, Feb 4, 2020

If you are using tfx==0.15 then TFX supports apache-beam versions >=2.16 and ❤️ as shown here 'apache-beam[gcp]>=2.16,<3'

but if you are using tfx==0.21, then TFX supports apache-beam versions >=2.17 and <2.18 as shown here 'apache-beam[gcp]>=2.17,<2.18'

I think you are using TFX 2.1. Please confirm @hanneshapke. Thanks!

0reactions
hanneshapkecommented, Apr 8, 2020

Closing this issue since version 0.15 is out-dated by now.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Apache Beam™ Downloads
0 is the latest released version. Using a central repository. The easiest way to use Apache Beam is via one of the released...
Read more >
Cannot deploy dataflow template because of requirements file
This error is often caused by dependency conflicts (example). Can you confirm that the dependencies in your requirements file is limited to only ......
Read more >
SDK version support status | Cloud Dataflow
SDK version Status Details 2.40.0 Supported This version will be deprecated on June 27, 2023. 2.39.0 Supported This version will be deprecated on May 25,...
Read more >
Installing Python Dependencies in Dataflow | Google Cloud
Apache Beam introduces 3 python dependency options in Managing Python ... it would cause version conflict and Dataflow job failure in the worst...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found