question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

--build_python_zip fails when runfiles include file containing '=' character (affects pyspark==2.4.6)

See original GitHub issue

Description of the problem / feature request:

A Bazel Python ‘zipapp’ cannot be built using --build_python_zip when the underlying py_binary target depends on pyspark==2.4.6. I think this is because pyspark contains files that include the “=” character in their filename, which breaks some logic in the --build_python_zip action.

Example Error:

INFO: Analyzed 2 targets (22 packages loaded, 708 targets configured).
INFO: Found 2 targets...
ERROR: /Users/jonathon/work/reproduce_zipapp_bug/spark_hello_world/BUILD:4:10: PythonZipper spark_hello_world/main.zip failed (Exit 255): zipper failed: error executing command external/bazel_tools/tools/zip/zipper/zipper cC bazel-out/darwin-fastbuild/bin/spark_hello_world/main.zip @bazel-out/darwin-fastbuild/bin/spark_hello_world/main.zip-0.params

Use --sandbox_debug to see verbose messages from the sandbox zipper failed: error executing command external/bazel_tools/tools/zip/zipper/zipper cC bazel-out/darwin-fastbuild/bin/spark_hello_world/main.zip @bazel-out/darwin-fastbuild/bin/spark_hello_world/main.zip-0.params

Use --sandbox_debug to see verbose messages from the sandbox
File kittens/date=2018-01/not-image.txt=external/pypi/pypi__pyspark/pyspark/data/mllib/images/partitioned/cls=kittens/date=2018-01/not-image.txt does not seem to exist.
INFO: Elapsed time: 2.814s, Critical Path: 0.35s
INFO: 2 processes: 2 internal.
FAILED: Build did NOT complete successfully

Bugs: what’s the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

I have a full public reproduction over in this repo: https://github.com/thundergolfer/bazel-build_python_zip-bug-reproduction (instructions in the README)

What operating system are you running Bazel on?

MacOS Catalina 10.15.7

What’s the output of bazel info release?

release 3.7.2

If bazel info release returns “development version” or “(@non-git)”, tell us how you built Bazel.

Replace this line with your answer.

What’s the output of git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?

git remote get-url origin ; git rev-parse main ; git rev-parse HEAD                                                                                                                                                                                             
git@github.com:thundergolfer/bazel-build_python_zip-bug-reproduction.git
ff5d23b14ade117b74494ecd3a0ed5666b8f224e
ff5d23b14ade117b74494ecd3a0ed5666b8f224e

Have you found anything relevant by searching the web?


👋 I can look further into this and submit a fix + test, when time permits.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:15 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
benjaminRomanocommented, Jan 12, 2022

To add onto that note, the executable zip generated by --build_python_zip after I applied my patch has terrible cold start performance due to Bazel’s implementation not caching the extracted files after first run.

I ended up abandoning this approach for distributing my python code.

1reaction
danieljanescommented, Nov 23, 2021

Is there any progress on this? This issue breaks our build since we introduced a new dependency that itself has a sub-dependency on pyarrow (as described by @thundergolfer and @benjaminRomano).

Read more comments on GitHub >

github_iconTop Results From Across the Web

No results found

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found