question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

gaps between distributed and dask releases to anaconda main channels results in incompatible environments

See original GitHub issue

What happened:

Twice in the last two months, LightGBM’s continuous integration has been broken by the following situation:

  • distributed changes in a way that makes it incompatible with older versions of dask
  • the newest release of distributed is published to anaconda’s main channels several days before the corresponding dask version
  • something like conda install -y dask distributed results in an environment with incompatible versions of dask and distributed
  • any tests involving Dask fail

I’ve documented the most recent instance of this problem in https://github.com/microsoft/LightGBM/issues/4285.

We ended up with an environment like this:

dask-2021.4.0              |     pyhd3eb1b0_0           5 KB
dask-core-2021.4.0         |     pyhd3eb1b0_0         670 KB
distributed-2021.4.1       |   py37h06a4308_0         1.0 MB

And saw all Dask tests in that project fail with this error:

>       from distributed.protocol.core import dumps_msgpack
E       ImportError: cannot import name 'dumps_msgpack' from 'distributed.protocol.core' (/root/miniconda/envs/test-env/lib/python3.7/site-packages/distributed/protocol/core.py)

Caused by the fact that distributed.protocol.core.dumps_msgpack() was removed in 2021.4.1 (#4677), but dask 2021.4.0 still relies on it.

What you expected to happen:

I expected that since dask and distributed are so tightly connected to each other, new versions of these libraries would be published to the main anaconda channels at the same time.

Minimal Complete Verifiable Example:

It’s hard to create an MCVE for this since it relies on external state in a package manager, but as of 12 hours ago the steps at https://github.com/microsoft/LightGBM/issues/4285#issuecomment-841000102 could reproduce this issue.

If you need more details than that please let me know and I can try to produce a tighter reproducible example.

Anything else we need to know?:

Environment:

  • Dask version: 2021.4.0
  • Python version: 3.7
  • Operating System: Ubuntu 20.04
  • Install method (conda, pip, source): conda

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

3reactions
jrbourbeaucommented, May 14, 2021

Thanks for reporting @jameslamb! FWIW some folks also ran into this with the 2021.04.1 release on conda-forge (see the discussion starting here https://github.com/dask/community/issues/150#issuecomment-826844711). I think the core issue here is that we don’t specify maximum allowed versions for our dask and distributed dependencies.

Over in https://github.com/dask/community/issues/155#issuecomment-841278326 I’m proposing we start pinning dask and distributed more tightly to avoid these types version inconsistency issues. If you have any thoughts on the topic, please feel free to engage over in that issue

1reaction
jrbourbeaucommented, May 17, 2021

Closing as discussion moved over to the dask/community issue tracker and the relevant folks have been pinged here for visibility

Read more comments on GitHub >

github_iconTop Results From Across the Web

Changelog — Dask.distributed 2022.12.1 documentation
This release changes the default scheduling mode to use queuing. This will significantly reduce cluster memory use in most cases, and generally improve ......
Read more >
Working notes by Matthew Rocklin - SciPy
We measure the performance of Dask's distributed scheduler for a variety of different workloads under increasing scales of both problem and cluster size....
Read more >
conda solving environment takes forever - You.com - You.com
In case it helps someone: This solved my "Solving Environment take forever" problem, but I also had to re-order the list of channels...
Read more >
Release 0+untagged.50.g9a95a2f.dirty Modin contributors
UserWarning: Dask execution environment not yet initialized. ... techniques for opportunistic evaluation that bridges the gap between.
Read more >
Introduction to conda for (data) Scientists - HackMD
The “smaller” the environment, the more manageable. No need to install Anaconda at all in the end! I understood if I don't request...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found