question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Parallell pex creation fails due to possible race conditions in package installation

See original GitHub issue

We’re using pex with makefiles and parallell jobs in CI and are noticing spurious errors like the one below

pex --find-links out/wheels -o out/pexes/pex1.pex ./pex1 -e pex1:main
pex --find-links out/wheels -o out/pexes/pex2.pex -r pex2/requirements.txt --python-script pex2/entrypoint.py
Could not find the installation RECORD for zipp 3.10.0 under /root/.pex/installed_wheels/4fcb6f278987a6605757302a6e40e896257570d11c51628968ccb2a47e80c6c1/zipp-3.10.0-py3-none-any.whl.workdir

The other pex built fine and if I retry it will work. Problem is probably made worse by having all wheels cached so no time is spent downloading them which makes it more likely to run into races when installing.

This would probably need some locking around the installed_wheels folder or more fine grained locking/atomik operations for the real installation folder.

It’s not urgent for us as I can avoid making those in parallell or ensure separate pexdirs but it was somewhat of a nasty issue to debug and would perhaps save someone the time and frustration in the future.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:11 (7 by maintainers)

github_iconTop GitHub Comments

4reactions
jsiroiscommented, Nov 2, 2022

I repro - which is great. Thank you for setting that gist up. I should be able to get to the bottom of this pretty quickly.

1reaction
jsiroiscommented, Nov 2, 2022

Actually - one question presents itself. Are your errors confined to missing files in paths that are prefixed by /root/.pex/installed_wheels/? If so, that might be a special case I can look at. I think that is the oldest use of atomic_directory and it may not use the locking at all, just an atomic rename. Your examples indicate this happens when Pex is calling pip install (Pex dogfoods itself and runs Pip as a venv PEX in there).

Read more comments on GitHub >

github_iconTop Results From Across the Web

concurrent pip install fails due to race condition #9470 - GitHub
When running pip install in parallel (multiple processes), we frequently get hash-mismatch error (THESE PACKAGES DO NOT MATCH THE HASHES FROM ...
Read more >
pex Changelog - pyup.io
the Pex 2.1.101 release whereby Pex would fail to install platform-specific packages on Red Hat based OSes. In addition, an old but only...
Read more >
Race Conditions in Asynchronous & Parallel Programming
A race condition is triggered when we start two threads, one to initialize the string and another to modify it into XML. Depending...
Read more >
Change history for coverage.py - Read the Docs
If coverage fails due to the coverage total not reaching the --fail-under value, it will now print a message making the condition clear....
Read more >
5. Concurrency and Race Conditions - Linux Device Drivers ...
Even expert Linux kernel programmers end up creating concurrency-related bugs ... numerous sources of concurrency and, therefore, possible race conditions.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found