--build_python_zip fails when runfiles include file containing '=' character (affects pyspark==2.4.6)
See original GitHub issueDescription of the problem / feature request:
A Bazel Python ‘zipapp’ cannot be built using --build_python_zip
when the underlying py_binary
target depends on pyspark==2.4.6
. I think this is because pyspark
contains files that include the “=” character in their filename, which breaks some logic in the --build_python_zip
action.
Example Error:
INFO: Analyzed 2 targets (22 packages loaded, 708 targets configured).
INFO: Found 2 targets...
ERROR: /Users/jonathon/work/reproduce_zipapp_bug/spark_hello_world/BUILD:4:10: PythonZipper spark_hello_world/main.zip failed (Exit 255): zipper failed: error executing command external/bazel_tools/tools/zip/zipper/zipper cC bazel-out/darwin-fastbuild/bin/spark_hello_world/main.zip @bazel-out/darwin-fastbuild/bin/spark_hello_world/main.zip-0.params
Use --sandbox_debug to see verbose messages from the sandbox zipper failed: error executing command external/bazel_tools/tools/zip/zipper/zipper cC bazel-out/darwin-fastbuild/bin/spark_hello_world/main.zip @bazel-out/darwin-fastbuild/bin/spark_hello_world/main.zip-0.params
Use --sandbox_debug to see verbose messages from the sandbox
File kittens/date=2018-01/not-image.txt=external/pypi/pypi__pyspark/pyspark/data/mllib/images/partitioned/cls=kittens/date=2018-01/not-image.txt does not seem to exist.
INFO: Elapsed time: 2.814s, Critical Path: 0.35s
INFO: 2 processes: 2 internal.
FAILED: Build did NOT complete successfully
Bugs: what’s the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
I have a full public reproduction over in this repo: https://github.com/thundergolfer/bazel-build_python_zip-bug-reproduction (instructions in the README)
What operating system are you running Bazel on?
MacOS Catalina 10.15.7
What’s the output of bazel info release
?
release 3.7.2
If bazel info release
returns “development version” or “(@non-git)”, tell us how you built Bazel.
Replace this line with your answer.
What’s the output of git remote get-url origin ; git rev-parse master ; git rev-parse HEAD
?
git remote get-url origin ; git rev-parse main ; git rev-parse HEAD
git@github.com:thundergolfer/bazel-build_python_zip-bug-reproduction.git
ff5d23b14ade117b74494ecd3a0ed5666b8f224e
ff5d23b14ade117b74494ecd3a0ed5666b8f224e
Have you found anything relevant by searching the web?
- GitHub issues: https://github.com/bazelbuild/rules_docker/issues/1254 seems relevant.
👋 I can look further into this and submit a fix + test, when time permits.
Issue Analytics
- State:
- Created 3 years ago
- Comments:15 (10 by maintainers)
Top Results From Across the Web
No results found
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
To add onto that note, the executable zip generated by
--build_python_zip
after I applied my patch has terrible cold start performance due to Bazel’s implementation not caching the extracted files after first run.I ended up abandoning this approach for distributing my python code.
Is there any progress on this? This issue breaks our build since we introduced a new dependency that itself has a sub-dependency on
pyarrow
(as described by @thundergolfer and @benjaminRomano).