question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

continue to ignore `info` in `load_files()` or just use `--ignore-missing-files`?

See original GitHub issue

The following error broke our nightlies a few nights ago:

$ conda pack --format tar.gz -j -1
Collecting packages...
CondaPackError:
Files managed by conda were found to have been deleted/overwritten in the
following packages:

- python='3.8.5'

This is usually due to `pip` uninstalling or clobbering conda managed files,
resulting in an inconsistent environment. Please check your environment for
conda/pip conflicts using `conda list`, and fix the environment by ensuring
only one version of each package is installed (conda preferred).

Debugging core.py, I found this following entry in paths.json is the root cause:

$ pwd
/export/home/ops/conda/pkgs/python-3.8.5-h7579374_1/info
$ diff paths.json.orig paths.json
900,905d899
<       "_path": "info/info_json.d/security.json",
<       "path_type": "hardlink",
<       "sha256": "1eba042cbb28d2403ca77fbdd8fb7ca518d65b0522a13730255ffdef694a826a",
<       "size_in_bytes": 1773
<     },
<     {

2 ways to solve the error:

  1. Use --ignore-missing-files
  2. Remove info from load_files: https://github.com/conda/conda-pack/blob/master/conda_pack/core.py#L585

What do you all recommend?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
mcg1969commented, Sep 20, 2020

I’ve decided your idea #2 is the right one. Conda environments composed entirely of well-formed conda packages should have no content in the info/ directory. Hence ignoring info as the current version of conda-pack does is a no-op. But this cuts both ways: ignoring info/ does nothing, and not ignoring info/ does nothing, either.

However, a small number of malformed conda packages have cropped up that do install a file in this subdirectory, leading to this missing files exception. So now ignoring the info/ directory is actually causing problems.

Looking at the commit history, it looks like info has been included in the ignore list for at least 3 years, and I don’t see a compelling reason to ignore it. That said, I’m still looking. If I find a good reason, I’ll revert this change in favor of another approach I’m considering: specifically ignoring info/ files found in the paths.json manifest of conda packages.

1reaction
mcg1969commented, Sep 14, 2020

OK, I might be leaning towards #2 now that I’ve had to wrestle with this in tests 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

Generic File Source Options - Spark 3.3.1 Documentation
Spark allows you to use spark.sql.files.ignoreMissingFiles to ignore missing files while reading data from files. Here, missing file really means the ...
Read more >
Ignoring files issues in Apache Spark SQL - Waiting For Code
In this blog post I will focus on 2 properties that you can use to manage issues with the input datasets, namely spark.sql.files....
Read more >
Spark - ignoring corrupted files - Stack Overflow
One way is look through your executor logs. If you have setup following configuratios to true in your spark configuration. RDD: spark.files.
Read more >
Auto Loader options | Databricks on AWS
Whether to ignore missing files. If true, the Spark jobs will continue to run when encountering missing files and the contents that have ......
Read more >
spark timestamp with timezone - We Are Gurgaon
The session time zone is set with the spark.sql.session.timeZone configuration and defaults to the JVM system local time zone. To load files with...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found