ERROR: Encountered corrupt package tarball
See original GitHub issueHi,
My step works flawlessly in local mode, but when I tried it with batch mode it failed with the message:
/bin/sh: 1: metaflow_CleanFlow_linux-64_54816c55859cfd0f8c3c9b2e51678ce87bc33a38/bin/python: not found
To understand what might be the reason, I locally ran a docker image(python3:6) and ran all the commands that run on the batch side.
I noticed when conda creating the environment, some of the packages inside pkgs
folder were failing to install. I digged deeper into it and noticed that somehow a few tarballs were not fully copied(~80%) to s3 bucket in the beginning and therefore they were incomplete.
I manually downloaded those tarballs and they all started to work fine.
What might be the reason for these incomplete tarball uploads to s3 bucket?
Computer: Mac OSX: 10.15.1
Conda: Anaconda 4.7.12
Metaflow: 2.0.1
List of the tarballs that have failed: chardet-3.0.4-py36_1003.tar.bz2
, six-1.14.0-py36_0.tar.bz2
, setuptools-45.1.0-py36_0.tar.bz2
, pip-20.0.2-py36_1.tar.bz2
Example error message: ERROR: Encountered corrupt package tarball at /root/.aws/metaflow/conda/pkgs/setuptools-45.1.0-py36_0.tar.bz2. Conda has left it in place. Please report this to the maintainers of your package. For the defaults channel, please report to https://github.com/continuumio/anaconda-issues
Issue Analytics
- State:
- Created 4 years ago
- Reactions:4
- Comments:30
@abaspinar @jasobrown @benjaminbluhm I just opened a PR #118 which should fix this issue.
Let me elaborate on the issue per se - Conda has two packaging formats -
.conda
and.tar.bz2
. Currently, we can only install.tar.bz2
packages offline - so we coax conda into downloading only.tar.bz2
dependencies forlinux-64
architectures (even when you execute on macOS). We ship these dependencies onto s3 and download and install them on AWS Batch and execute user code within the environment thus created. To reliably generate the installation manifest as well as the specific packages we need to ship, we inspect a bunch of metadata files that conda generates while creating the environment on your laptop. For some very specific packages, in certain specific scenarios, conda fails to update the metadata information correctly, leading us to the error you have been observing. PR #118 should fix this issue. You will have to delete the.metaflow/
directory and delete thes3://bucket/metaflow/conda
s3 directory to evict the corrupt packages. If you have a sizeable deployment of metaflow and/or are unwilling to delete the s3 folder, I can point out steps to surgically isolate the affected dependencies - please let us know.@savingoyal is the below error related to this issue? This popped up today for me and another person working on two different Flows. Is it because something got updated and somehow messed with the packaging? My fix was to go into ./metaflow/FlowName/conda.dependencies and wipe it. Then on rerunning, packages were reinstalled and everything worked.
2020-03-13 12:43:27.430 [155/start/634 (pid 97)] [6431d58b-53f6-4b9b-b881-12960b0dd1a4] File "/metaflow/metaflow/datatools/s3.py", line 640, in _s3op_with_retries 2020-03-13 12:43:27.431 [155/start/634 (pid 97)] [6431d58b-53f6-4b9b-b881-12960b0dd1a4] raise MetaflowS3NotFound(err_out) 2020-03-13 12:43:27.438 [155/start/634 (pid 97)] [6431d58b-53f6-4b9b-b881-12960b0dd1a4] metaflow.datatools.s3.MetaflowS3NotFound: s3op failed: 2020-03-13 12:43:27.438 [155/start/634 (pid 97)] [6431d58b-53f6-4b9b-b881-12960b0dd1a4] URL not found: s3://metaflow-metaflows3bucket-17wkzp8g2oi27/metaflow/conda/conda.anaconda.org/conda-forge/linux-64/_openmp_mutex-4.5-1_llvm.tar.bz2/fa8d764c883a53d22ce622bea830c818/_openmp_mutex-4.5-1_llvm.tar.bz2 2020-03-13 12:43:27.439 [155/start/634 (pid 97)] Batch error: 2020-03-13 12:43:27.558 [155/start/634 (pid 97)] Task crashed. This could be a transient error. Use @retry to retry. 2020-03-13 12:43:27.559 [155/start/634 (pid 97)] 2020-03-13 12:43:28.814 [155/start/634 (pid 97)] Task failed.