question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

PIP cache should cache the installed packages as well

See original GitHub issue

Description: Currently, setup-python caches only the ~/.cache/pip directory to avoid redownloads. However, it doesn’t cache the installed packages. As some package have lengthy installation steps, this leads to delays in builds.

You can see the current behaviour for example in https://github.com/crabhi/setup-python-cache-test/actions/runs/1789016634 (or in attached build.txt) - the pip install output shows “Collecting” and “Installing” instead of “Requirement already satisfied” for all packages.

Justification: For example installing the ansible package takes well over a minute even if it’s already downloaded.

Are you willing to submit a PR? Yes, I can try.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:17
  • Comments:23 (9 by maintainers)

github_iconTop GitHub Comments

9reactions
Axeln78commented, Jul 13, 2022

Sorry, @dhvcc if I didn’t manage to make myself clear. actions/setup-python@v4 uses actions/cache@v3 under the hood and users do not need to call on the actions/cache@v3 module in an example such as:

    - uses: actions/checkout@v3
    - name: Set up Python 3.10 and caches
      id: setup and cache
      uses: actions/setup-python@v4
      with:
        python-version: '3.10'
        cache: 'pip'

It would be great if the installed packages could be cached as well (the purpose of this issue #330) through actions/setup-python@v4

8reactions
rashidnhmcommented, May 14, 2022

Ok, nice. The code seemed ok, so that was strange. I’d only advise you to may be not run pip install if cache was hit implying you don’t want to modify cache in any way if it’s hit to avoid corruption

So I have done quite a deep dive into the venv corruption issue, and I believe I know what happened, and how to avoid it as well.

The version of Python between when my cache was created and when it was restored changed. And I had a generic restore key which matched the old cache key. See detailed explanation below.

This is how I had my yaml file was when I hit this error:

# BAD CONFIG DO NOT USE (Illustrative purposes only)

- uses: actions/checkout@v3

- id: setup_python
  uses: actions/setup-python@v3
  with:
    python-version: 3.7

- id: python_cache
  uses: actions/cache@v3
  with:
    path: venv
    key: pip-${{ steps.setup_python.outputs.python-version }}-${{ hashFiles('requirements.txt') }}
    restore-keys: |
      pip-${{ steps.setup_python.outputs.python-version }}-
      pip-  # This line in specific was the cause of the issue

- if: steps.python_cache.outputs.cache-hit != 'true'
  run: |
    python3 -m venv venv

- run: |
    venv/bin/python3 -m pip install -r requirements.txt

When this workflow initially ran and saved the venv to cache, the latest release of Python3.7 was 3.7.12 … meaning the venv created had symlinks to 3.7.12.

However, few days later when the workflow ran again, the latest release of Python3.7 was 3.7.13.

Notice in my workflow I don’t pin my Python patch version, so actions/setup-python downloaded the latest available patch release of Python 3.7 (as expected).

However, my restore-key pip- matched the old cache, which restored the old venv created for Python 3.7.12 … meaning all the symlinks inside were now broken! I have setup Python 3.7.13 but am trying to use a venv with symlinks to 3.7.12! Hence why when I tried to call the python executable from the venv, it could not find the file!

The resolution is to really ensure that the output of setup python is always part of the cache key. So any change in python version (even a patch version bump) would create a new cache key.

This is the code I have now, it has been working well without any issues. I have updated the workflow with the advice @dhvcc gave in the above comment. The venv is not touched if there is a cache hit.

- uses: actions/checkout@v3

- id: setup_python
  uses: actions/setup-python@v3
  with:
    python-version: 3.7

- id: python_cache
  uses: actions/cache@v3
  with:
    path: venv
    key: pip-${{ steps.setup_python.outputs.python-version }}-${{ hashFiles('requirements.txt') }}

- if: steps.python_cache.outputs.cache-hit != 'true'
  run: |
    # Check if venv exists (restored from secondary keys if any, and delete)
    # You might not need this line if you only have one primary key for the venv caching
    # I kept it in my code as a fail-safe
    if [ -d "venv" ]; then rm -rf venv; fi
    
    # Re-create the venv
    python3 -m venv venv

    # Install dependencies
    venv/bin/python3 -m pip install -r requirements.txt
Read more comments on GitHub >

github_iconTop Results From Across the Web

How to cache downloaded PIP packages - Stack Overflow
You can use a specific environment variable PIP_DOWNLOAD_CACHE and make it point to a directory where your packages will be stored.
Read more >
Caching - pip documentation v22.3.1
pip provides an on-by-default caching, designed to reduce the amount of time spent on duplicate downloads and builds.
Read more >
Using cache for pip/npm dependencies in Gitlab CI
One thing I would like to point out about the Python example above is that it caches the venv directory it installs packages...
Read more >
Python caching in GitHub Actions - AI2 Blog
The recommended way to speed this up is to use the cache action to cache the pip cache, which is basically a cache...
Read more >
Pip Clear Cache - Linux Hint
The caching mechanism allows pip to improve the download and installation of the packages. This is because pip does not need to download...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found