question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Ensure 'pip wheel' can create .so artifacts deterministically

See original GitHub issue

What’s the problem this feature will solve?

The Bazel build system has the major selling point of supporting both local and remote-caching.

In order for that caching to work though, Bazel targets must be built deterministically so that the same target always has the same content-addressable hash.

Currently pip wheel is non-deterministic, so our Python Bazel targets will cache miss if they depend on something built with pip wheel.

Describe the solution you’d like

Note: The following is the output of a Bazel execution log. A bit unrelated to the pip wheel command but shows the relevant information.

inputs {
  path: "external/pypi__PyYAML_5_1/PyYAML-5.1.dist-info/LICENSE"
  digest {
    hash: "a2adb9c959b797494a0ef80bdf60e22db2749ee3e0c0908556e3eb548f967c56"
    size_bytes: 1101
    hash_function_name: "SHA-256"
  }
}
inputs {
  path: "external/pypi__PyYAML_5_1/PyYAML-5.1.dist-info/METADATA"
  digest {
    hash: "df7bc0c7cbd2ce350c5c61ceda3a74bbcb6f82446a7c01f7f8e1034a98df231f"
    size_bytes: 1704
    hash_function_name: "SHA-256"
  }
}
inputs {
  path: "external/pypi__PyYAML_5_1/PyYAML-5.1.dist-info/RECORD"
  digest {
    hash: "6fe803b74ab4fcab1f23e96060cf062d12779598af7e72692c492c2dd7cad0ed"
    size_bytes: 1701
    hash_function_name: "SHA-256"
  }
}
inputs {
  path: "external/pypi__PyYAML_5_1/PyYAML-5.1.dist-info/WHEEL"
  digest {
    hash: "cdf2c8f141bc498ae490a88870d655dd174abe3db8c1f57562224b168930c624"
    size_bytes: 104
    hash_function_name: "SHA-256"
  }
}
inputs {
  path: "external/pypi__PyYAML_5_1/PyYAML-5.1.dist-info/top_level.txt"
  digest {
    hash: "ae98f42153138ac02387fd6f1b709c7fdbf98e9090c00cfa703d48554e597614"
    size_bytes: 11
    hash_function_name: "SHA-256"
  }
}
inputs {
  path: "external/pypi__PyYAML_5_1/_yaml.cpython-36m-x86_64-linux-gnu.so"
  digest {
    hash: "a7f3774015f839ccee5e2281bbfdf22a42e0e1dafaac33ef4c91db83a07210d9"
    size_bytes: 1133288
    hash_function_name: "SHA-256"
  }
}
inputs {
  path: "external/pypi__PyYAML_5_1/yaml/__init__.py"
  digest {
    hash: "2af8b6dbcb1df5c63597f215421cad02f2317e291061b181b0f7bbf4f71ac0dd"
    size_bytes: 12012
    hash_function_name: "SHA-256"
  }
}

The following is a subset of the build outputs of the PyYAML package. Of the build outputs, it is the RECORD files and the _yaml.cpython-36m-x86_64-linux-gnu.so shared object file that have non-deterministic hashes build to build. I have inspected the RECORD file and found that it contains the hash of the .so file, so it is non-deterministic because of the .so file, and I think only because of that.

So the problem is the .so file.

I ran the strings program on the .so file and found this printable string: /tmp/pip-wheel-_bd8v3f2/pyyaml. That is coming from here:

https://github.com/pypa/pip/blob/6af9de97bbd2427f82942e476c590a2db22ea1ff/src/pip/_internal/wheel.py#L649

So while I found other differences between different _yaml.cpython-36m-x86_64-linux-gnu.so, this tmp directory usage leaking in itself is sufficient to break determinism.

Additional context

rules_python issue discussing this problem: https://github.com/bazelbuild/rules_python/issues/154 rules_python repo: https://github.com/bazelbuild/rules_python

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:12
  • Comments:19 (9 by maintainers)

github_iconTop GitHub Comments

5reactions
mantheycommented, Aug 30, 2019

In a simple test, I was able to get consistent builds by exporting CFLAGS=-g0 before building the wheel. This prevents adding any of the debug information to the generated libraries which is where the TempDirectory was being pulled in. I also have SOURCE_DATE_EPOCH set. I don’t know how universal this is (and, of course, you lose debugging symbols).

3reactions
uranusjrcommented, Jul 17, 2020

It is slightly different, since building in the source tree does not necessarily mean the built artifacts are in the source tree. It is only by tradition the most popular back-end (setuptools) does this. Having in-tree builds would happen to solve the immediate problem, but IMO the ultimate solution to this problem would be to introduce a flag to PEP 517 that can tell the back-end where they must generate the artifact in, and create a flag in pip to let user provide that information.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Wheel caching and non-deterministic builds - Packaging
So, we have a non-deterministic build process. However, pip always caches the built wheel. The user later attempts to load the missing library ......
Read more >
cibuildwheel 2.0.0a2 - PyPI
It helps ensure that the library can run without any dependencies outside of the pip toolchain. This is similar to static linking, so...
Read more >
What is the difference between a "pip wheel -e" and "pip install
We get an entire build directory with lots of artifacts. Editable project location is empty. # pip wheel -e . --no-cache-dir # ...
Read more >
Advanced Usage of Pipenv - Python Packaging Authority
Dependencies of wheels provided in a Pipfile will not be captured by $ pipenv lock . There are some known issues with using...
Read more >
Why pipenv > venv - ActiveState
You can simply pip install pipenv to get started, and then pipenv myvenv to ... required for a build AND uses hashes to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found