question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Dataflow workers not able to install tfx from requirements file due to `no-binary` option from beam stager

See original GitHub issue

When no Beam packaging arguments are provided by the user, TFX generates a requirements file with the tfx package inside.

This ends up failing on Dataflow, because the Beam stager uses pip’s --no-binary flag: https://github.com/apache/beam/blob/v2.15.0/sdks/python/apache_beam/runners/portability/stager.py#L483.

Indeed, in a fresh virtualenv (Python 3.6.3):

pip download tfx==0.14.0 --no-binary :all:
Collecting tfx==0.14.0
  ERROR: Could not find a version that satisfies the requirement tfx==0.14.0 (from versions: none)
ERROR: No matching distribution found for tfx==0.14.0

Whereas if I remove the --no-binary flag, it works just fine.

I’m not all that knowledgable about Python packaging, but is this because TFX is built as a wheel? Is there some Beam option I can pass to make this work?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:7
  • Comments:18 (6 by maintainers)

github_iconTop GitHub Comments

5reactions
andrewsmartincommented, Oct 17, 2019

Hi @tejaslodaya - glad you found a workaround, but it is just that - a workaround. That said I’m going to keep this open.

3reactions
tejaslodayacommented, Oct 16, 2019

Hi @andrewsmartin and @charlesccychen

I managed to solve this issue by doing these steps:

  1. Go to site-packages inside your virtual environment and go to apache_beam/runners/portability/stager.py file.
  2. Go to _populate_requirements_cache function and remove these two lines ‘–no-binary’, ‘:all:’
  3. Reload the package inside your jupyter notebook/ main call.

In my case, I had created conda environment and changed this file: ~/miniconda3/envs/tfx_test/lib/python3.7/site-packages/apache_beam/runners/portability/stager.py where my environment name is tfx_test.

This solves the issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Can't pass in Requirements.txt for Dataflow - Stack Overflow
txt file which I believe I'm passing in correctly. My pipeline code: import apache_beam as beam from apache_beam.runners.interactive.
Read more >
Facing issues running tensorflow_io library on dataflow in a tfx ...
I am currently facing a related issue on dataflow when using tfx library. The tfx pipeline works fine locally but it fails on...
Read more >
Managing Python Pipeline Dependencies - Apache Beam
txt file and delete all packages that are not relevant to your code. Run your pipeline with the following command-line option: --requirements_file requirements....
Read more >
Re: Pipeline is passing on local runner and failing on Dataflow ...
The file does not include apache-beam package, only apache-airflow==1.9.0 ... the package related to indexes.base is not installed in >> the workers.
Read more >
Degree Programs UNDERGRADUATE CATALOG
PrintFX and FabLab ... principles of visual organization, including the ability to work with ... There is no option for appeal of this...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found