Task dependency bug in worker
See original GitHub issueWhat happened:
The following code fails sometimes with an KeyError
error (3 out of 4 times). git bisect
says the bug was introduced by https://github.com/dask/distributed/pull/4107.
distributed.worker - ERROR - "('split-simple-shuffle-b50994a1a48d47067bc463a19b005e75', 14, 55)"
Traceback (most recent call last):
File "/home/nfs/mkristensen/repos/distributed/distributed/worker.py", line 1984, in gather_dep
deps_ts = [self.tasks[key] for key in deps]
File "/home/nfs/mkristensen/repos/distributed/distributed/worker.py", line 1984, in <listcomp>
deps_ts = [self.tasks[key] for key in deps]
KeyError: "('split-simple-shuffle-b50994a1a48d47067bc463a19b005e75', 14, 55)"
What you expected to happen:
For some reason the worker calls gather_dep()
with a set of deps
that the task does not depend on. As far as I can see, the client and scheduler maintains task dependencies correctly.
Minimal Complete Verifiable Example:
import pandas as pd
import numpy as np
import dask.dataframe as dd
from dask.dataframe.shuffle import shuffle
from distributed import wait
import dask
from distributed import Client, LocalCluster
nparts = 100
max_branch = 100
data_size = nparts * max_branch
def main(client):
df = pd.DataFrame({"x": np.arange(data_size)})
ddf = dd.from_pandas(df, npartitions=nparts)
ddf = ddf.persist(optimize_graph=False)
wait(ddf)
with dask.config.set({"optimization.fuse.active": False}):
s = shuffle(
ddf, ddf.x, shuffle="tasks", npartitions=nparts, max_branch=max_branch
)
s = s.persist()
wait(s)
if __name__ == "__main__":
with LocalCluster(scheduler_port=0, asynchronous=False, n_workers=5) as cluster:
with Client(cluster, asynchronous=False) as client:
main(client)
cc. @gforsyth
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
Bug when copying project template with task dependencies ...
Blocking/Blocked tasks getting assigned to wrong project/tasks up when quickly creating multiple new projects from same template.
Read more >Task dependencies in the bug-fixing process - ResearchGate
Download scientific diagram | Task dependencies in the bug-fixing process from ... This research investigates the variety of work practices used in public ......
Read more >What are dependencies on the project roadmap? | Jira ...
Learn about what dependencies are, and what they can tell you about your plan in Jira Software Cloud.
Read more >Upgrading your build from Gradle 5.x to 6.0
When Gradle detects problems with task definitions (such as incorrectly ... This fixes an issue where a worker needs to use a dependency...
Read more >Task Dependency in Zoho Projects
Task dependency is the relationship in which a task relies on one or more tasks to be performed in a certain order before...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I can confirm the issue. Here’s an inelegant fix while I figure out the correct one:
I wanted to confirm that this was due to (I think) work stealing and not a request for the “wrong” dependencies. Tried to get a test that would reliably fail on this but didn’t quite get there.
yes, I think this is the right way to do this, I’ll push up the PR now