question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

test_patchset_download failing in CI due to download of gzip file failing

See original GitHub issue

Description

The CI has been failing recently on

https://github.com/scikit-hep/pyhf/blob/e8a182d68fc268ab8b0f5a553317f6bf1c962047/tests/test_scripts.py#L539-L549

with

=================================== FAILURES ===================================
_ test_patchset_download[inprocess-https://www.hepdata.net/record/resource/1408476?view=true] _

datadir = local('/tmp/pytest-of-runner/pytest-0/test_patchset_download_inproce0')
script_runner = <ScriptRunner inprocess>
archive = 'https://www.hepdata.net/record/resource/1408476?view=true'

    @pytest.mark.parametrize(
        "archive",
        [
            "https://www.hepdata.net/record/resource/1408476?view=true",
            "https://doi.org/10.17182/hepdata.89408.v1/r2",
        ],
    )
    def test_patchset_download(datadir, script_runner, archive):
        command = f'pyhf contrib download {archive} {datadir.join("likelihoods").strpath}'
        ret = script_runner.run(*shlex.split(command))
>       assert ret.success
E       assert False
E        +  where False = <pytest_console_scripts.RunResult object at 0x7fb224482250>.success

tests/test_scripts.py:549: AssertionError
----------------------------- Captured stdout call -----------------------------
# Running console script: pyhf contrib download https://www.hepdata.net/record/resource/1408476?view=true /tmp/pytest-of-runner/pytest-0/test_patchset_download_inproce0/likelihoods
# Script return code: 1
# Script stdout:

# Script stderr:
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "/home/runner/work/pyhf/pyhf/src/pyhf/contrib/cli.py", line 67, in download
    utils.download(archive_url, output_directory, force, compress)
  File "/home/runner/work/pyhf/pyhf/src/pyhf/contrib/utils.py", line 62, in download
    mode="r|gz", fileobj=BytesIO(response.content)
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/tarfile.py", line 1603, in open
    stream = _Stream(name, filemode, comptype, fileobj, bufsize)
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/tarfile.py", line 377, in __init__
    self._init_read_gz()
  File "/opt/hostedtoolcache/Python/3.7.10/x64/lib/python3.7/tarfile.py", line 482, in _init_read_gz
    raise ReadError("not a gzip file")
tarfile.ReadError: not a gzip file

but it isn’t clear why this is happening as I’m unable to reproduce the Issue locally

$ docker run --rm -ti python:3.7 /bin/bash
root@08a8cf8dd4ca:/# pip --quiet install --upgrade pip setuptools wheel
root@08a8cf8dd4ca:/# pip --quiet install pyhf[contrib]
root@08a8cf8dd4ca:/# pyhf contrib download https://www.hepdata.net/record/resource/1408476?view=true /tmp/likelihoods
root@08a8cf8dd4ca:/# ls -lR /tmp/likelihoods/
/tmp/likelihoods/:
total 60332
-rw-r--r-- 1 1000 1000  4436904 May  7  2020 BkgOnly.json
-rw-r--r-- 1 1000 1000     1378 May 30  2020 README.md
-rw-r--r-- 1 1000 1000 57332112 May 31  2020 patchset.json

or in tests

$ python -m pytest -sx tests/test_scripts.py -k test_patchset_download
...
============================================================================== 2 passed, 58 deselected, 1 warning in 22.04s

@lukasheinrich @kratsg have any ideas?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:11 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
matthewfeickertcommented, Jun 9, 2021

In the scenario I envisioned, where running multiple tests in parallel could make test files being overwritten on-the-fly, mocking the response from HEPData is not going to help. Using tempfile will.

Yup that’s clear. 👍 Though my understanding is that the tmpdir fixture should keep everything nicely separated so that there’s no opportunity for test files to be overwritten.

1reaction
matthewfeickertcommented, Jun 9, 2021

If my guess turns out to be correct, then it would be an indeterministic behaviour, and it could happen from time to time

Yeah, things are working again now for no clear reason. So I’m not sure. We should still do the mocking to not have to deal with this as much.

Yeah, I don’t know why datadir was used here. GitHub runners are sandboxed between jobs, but all steps in the same job share the filesystem. I’m not sure what would cause this to be an issue now and not in the past.

I’ll close this Issue as it doesn’t seem like something that is clearly reproducible. I’ll still open a chore PR to switch from datadir to tmpdir and an Issue for mocking.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Unzipping of downloaded .gz file fails due to ... - Stack Overflow
Trying to download a . gz file, unzip it to memory and then read line by line of the unzipped content. There are...
Read more >
Download Fails Gzip Decompression · Issue #1724 - GitHub
I have files that have been uploaded using the gcloud command line interface with the -z flag which applies gzip content-encoding on files...
Read more >
gzip — Support for gzip files — Python 3.11.1 documentation
The GzipFile class reads and writes gzip-format files, automatically compressing or decompressing the data so that it looks like an ordinary file object....
Read more >
Transcoding of gzip-compressed files | Cloud Storage
Attempts to decompress the object will fail. Similarly, a file that is not gzip-compressed should not be uploaded with the Content-Encoding: gzip ....
Read more >
Unexpected end of file. Gzip compressed file - Super User
A workaround for uncompressing a file when gzip fails with "unexpected end of file" is to use zcat (also usually provided by the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found