question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

RFC: conda-forge epochs for solver accuracy, speed & debuggability?

See original GitHub issue

Conda and mamba’s solver take into account the entirety of packages ever published when trying to resolve an environment (with some accelerations, i.e. checking first if things are resolvable with repodata_current.json).

This can lead sometimes lead the solver astray and force it into very weird contortions, where very old packages are picked just because they seemingly satisfy the constraints (though realistically, this is almost always an error in our metadata). There are many examples of this, here’s a few that came up recently:

While this definitely also has some advantages (less rebuilds, old packages stay installable), this also can run into inevitable problems where old packages haven’t been rebuilt for modern dependencies (e.g. no run-exports), not aware of unknown-at-the-time ABI breaks, noarch vs. yesarch, etc.

So it would be nice to give users a way to enforce an option that says “I only want comparatively recent packages” or, in other words, “please don’t do unexpected/unintended/crazy things while trying to resolve my environment”.

I was thinking about how this could be done in a way that wouldn’t require constant rebuilds (i.e. say, if a “conda-forge epoch” were to be defined as equal to a calendar year, nothing would be installable in January until all common packages have been rebuilt).

My current idea looks as follows:

  • There’s an empty metapackage __conda-forge-epoch that gets built every day (or week, or month), and versioned accordingly, i.e. 2022.12.19.
  • All outputs gain an automatic run-constraint
    run_constrained:
       {% set epoch = datetime.date.today().strftime('%Y.%m.%d') %}
       - __conda-forge-epoch <={{ epoch }}
    
    • note the <=, which is the other way around from e.g. our usual run-exports.
    • implementing this (without having to modify every recipe) probably needs support from conda-build, but for now I’m assuming this is possible.
  • By default, __conda-forge-epoch does not get installed, and therefore the constraints don’t get triggered.
    • This also means we wouldn’t have to rebuild stuff more often than we already do, as the proposed default is effectively the same as the status quo.
    • In other words, there are no hard “epoch breaks” (like we had once upon a time for going from the old compilers to the new ones).
  • If a user wants to make avoid certain solver errors, or simply enforce recent builds, they can add __conda-forge-epoch>=yyyy.mm.dd to their environment specs (now we have the >=). This would force the solver to only take into account packages built after that date.
  • Perhaps even more importantly, it would allow users (& conda-forge members) to more easily debug solver errors, by forcing the solver to only consider a more recent subset of packages, without getting lost in the weeds of the past.

I think just the debugging capabilities of this would make this worth considering, but maybe I’m just not very good at debugging resolver errors. 😅

Would be interested to hear people’s thoughts.

Issue Analytics

  • State:open
  • Created 9 months ago
  • Reactions:1
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

3reactions
chrisburrcommented, Dec 19, 2022

Rather than having a metapackage I think this could be an install flag as the build timestamps are already included in the repodata.

1reaction
h-vetinaricommented, Dec 21, 2022

I think it is important to not ignore packages that didn’t need rebuilding for a long time.

This is in fact explicitly what I’d like to be able to do (not by default of course). Packages that haven’t been rebuilt in a while are often subtly incompatible (compare the recent libxml2 issues), and figuring out which feedstocks among a given set of dependencies haven’t been rebuilt in a while is a useful tool for chasing down resolver errors.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Understanding and Improving Conda's performance - Anaconda
We're working on it and we wanted to explain a few of the facets that we're looking at to solve the problem. TL;DR:...
Read more >
Untitled
Keras has the low-level flexibility to implement arbitrary research ideas while offering optional high-level convenience features to speed up ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found