question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MemoizableIOManager doesn't handle "duplicated" run instances

See original GitHub issue

Summary

This might be more of a design bug or request for feature. But afaiu memoization/MemoizableIOManager depends on solid’s version and config, and should resolve if there is already output based on that local information. And afaiu dagster currently supports multiple instances of dagit, or even running the same pipeline by two separate VMs (or developers). This means that there might be a case when there is more than 1 instance of the same memoized solid running potentially creating conflict at “write”/materialisation time of the output.

In other systems these problems are resolved by having “global coordinator”, afaiu:

  • Airflow has global dag bag
  • Luigi you have the “global scheduler” process to resolve this conflict
  • Prefect requires registration to global scheduler

To reproduce you can write a pipeline that uses MemoizableIOManager and writes results to global location, like object store, and have two developers start it at the same time with the same config.

I would also suggest that this limitation should be well documented in the memoization documentation.


Message from the maintainers:

Impacted by this bug? Give it a 👍. We factor engagement into prioritization.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
sryzacommented, Apr 15, 2021
0reactions
ravwojdylacommented, Apr 14, 2021

@sryza see the high level doc here: https://luigi.readthedocs.io/en/stable/central_scheduler.html. Luigi itself is a relatively small amount of code, so the code itself might actually be the best documentation, see: https://github.com/spotify/luigi/blob/master/luigi/scheduler.py. Hope this is helpful.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pain Points of Haskell - Hacker News
I've got a REST API in production linking to postgres and rabbitmq and I'm very happy with the performance and maintainability. The ecosystem...
Read more >
nub - Hoogle - Haskell.org
The nub function removes duplicate elements from a list. In particular, it keeps only the first occurrence of each element. (The name nub...
Read more >
Viewing online file analysis results for 'background.js'
Analysed 1 process in total. WScript.exe "C:\background.js" (PID: 2212). Logged Script Calls, Logged Stdout, Extracted Streams ...
Read more >
ChangeLog-SLE-15-SP3-GM-SLE-15-SP4-Snapshot ... - SUSE
Improve _service file to handle to drop +0 in versions for when we are ... For some reason this command doesn't seem to...
Read more >
Diff - voltha-openonu-adapter - Gitiles - Gerrit
VOL-1451 Openonu now runs using pyvoltha completely Remove code that now exists ... Python 2.x doesn't have built-in support for recvmsg, so we...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found