Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

continue to ignore `info` in `load_files()` or just use `--ignore-missing-files`?

See original GitHub issue

The following error broke our nightlies a few nights ago:

$ conda pack --format tar.gz -j -1
Collecting packages...
CondaPackError:
Files managed by conda were found to have been deleted/overwritten in the
following packages:

- python='3.8.5'

This is usually due to `pip` uninstalling or clobbering conda managed files,
resulting in an inconsistent environment. Please check your environment for
conda/pip conflicts using `conda list`, and fix the environment by ensuring
only one version of each package is installed (conda preferred).

Debugging core.py, I found this following entry in paths.json is the root cause:

$ pwd
/export/home/ops/conda/pkgs/python-3.8.5-h7579374_1/info
$ diff paths.json.orig paths.json
900,905d899
<       "_path": "info/info_json.d/security.json",
<       "path_type": "hardlink",
<       "sha256": "1eba042cbb28d2403ca77fbdd8fb7ca518d65b0522a13730255ffdef694a826a",
<       "size_in_bytes": 1773
<     },
<     {

2 ways to solve the error:

Use --ignore-missing-files
Remove info from load_files: https://github.com/conda/conda-pack/blob/master/conda_pack/core.py#L585

What do you all recommend?

Issue Analytics

State:
Created 3 years ago
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

mcg1969commented, Sep 20, 2020

I’ve decided your idea #2 is the right one. Conda environments composed entirely of well-formed conda packages should have no content in the info/ directory. Hence ignoring info as the current version of conda-pack does is a no-op. But this cuts both ways: ignoring info/ does nothing, and not ignoring info/ does nothing, either.

However, a small number of malformed conda packages have cropped up that do install a file in this subdirectory, leading to this missing files exception. So now ignoring the info/ directory is actually causing problems.

Looking at the commit history, it looks like info has been included in the ignore list for at least 3 years, and I don’t see a compelling reason to ignore it. That said, I’m still looking. If I find a good reason, I’ll revert this change in favor of another approach I’m considering: specifically ignoring info/ files found in the paths.json manifest of conda packages.

1reaction

mcg1969commented, Sep 14, 2020

OK, I might be leaning towards #2 now that I’ve had to wrestle with this in tests 😃

Top Results From Across the Web

Generic File Source Options - Spark 3.3.1 Documentation

Spark allows you to use spark.sql.files.ignoreMissingFiles to ignore missing files while reading data from files. Here, missing file really means the ...

Ignoring files issues in Apache Spark SQL - Waiting For Code

In this blog post I will focus on 2 properties that you can use to manage issues with the input datasets, namely spark.sql.files....

Spark - ignoring corrupted files - Stack Overflow

One way is look through your executor logs. If you have setup following configuratios to true in your spark configuration. RDD: spark.files.

Auto Loader options | Databricks on AWS

Whether to ignore missing files. If true, the Spark jobs will continue to run when encountering missing files and the contents that have ......

spark timestamp with timezone - We Are Gurgaon

The session time zone is set with the spark.sql.session.timeZone configuration and defaults to the JVM system local time zone. To load files with...