continue to ignore `info` in `load_files()` or just use `--ignore-missing-files`?
See original GitHub issueThe following error broke our nightlies a few nights ago:
$ conda pack --format tar.gz -j -1
Collecting packages...
CondaPackError:
Files managed by conda were found to have been deleted/overwritten in the
following packages:
- python='3.8.5'
This is usually due to `pip` uninstalling or clobbering conda managed files,
resulting in an inconsistent environment. Please check your environment for
conda/pip conflicts using `conda list`, and fix the environment by ensuring
only one version of each package is installed (conda preferred).
Debugging core.py
, I found this following entry in paths.json
is the root cause:
$ pwd
/export/home/ops/conda/pkgs/python-3.8.5-h7579374_1/info
$ diff paths.json.orig paths.json
900,905d899
< "_path": "info/info_json.d/security.json",
< "path_type": "hardlink",
< "sha256": "1eba042cbb28d2403ca77fbdd8fb7ca518d65b0522a13730255ffdef694a826a",
< "size_in_bytes": 1773
< },
< {
2 ways to solve the error:
- Use
--ignore-missing-files
- Remove
info
fromload_files
: https://github.com/conda/conda-pack/blob/master/conda_pack/core.py#L585
What do you all recommend?
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
Generic File Source Options - Spark 3.3.1 Documentation
Spark allows you to use spark.sql.files.ignoreMissingFiles to ignore missing files while reading data from files. Here, missing file really means the ...
Read more >Ignoring files issues in Apache Spark SQL - Waiting For Code
In this blog post I will focus on 2 properties that you can use to manage issues with the input datasets, namely spark.sql.files....
Read more >Spark - ignoring corrupted files - Stack Overflow
One way is look through your executor logs. If you have setup following configuratios to true in your spark configuration. RDD: spark.files.
Read more >Auto Loader options | Databricks on AWS
Whether to ignore missing files. If true, the Spark jobs will continue to run when encountering missing files and the contents that have ......
Read more >spark timestamp with timezone - We Are Gurgaon
The session time zone is set with the spark.sql.session.timeZone configuration and defaults to the JVM system local time zone. To load files with...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’ve decided your idea #2 is the right one. Conda environments composed entirely of well-formed conda packages should have no content in the
info/
directory. Hence ignoringinfo
as the current version of conda-pack does is a no-op. But this cuts both ways: ignoringinfo/
does nothing, and not ignoringinfo/
does nothing, either.However, a small number of malformed conda packages have cropped up that do install a file in this subdirectory, leading to this missing files exception. So now ignoring the
info/
directory is actually causing problems.Looking at the commit history, it looks like
info
has been included in the ignore list for at least 3 years, and I don’t see a compelling reason to ignore it. That said, I’m still looking. If I find a good reason, I’ll revert this change in favor of another approach I’m considering: specifically ignoringinfo/
files found in thepaths.json
manifest of conda packages.OK, I might be leaning towards #2 now that I’ve had to wrestle with this in tests 😃