MemoizableIOManager doesn't handle "duplicated" run instances
See original GitHub issueSummary
This might be more of a design bug or request for feature. But afaiu memoization/MemoizableIOManager
depends on solid’s version and config, and should resolve if there is already output based on that local information. And afaiu dagster currently supports multiple instances of dagit, or even running the same pipeline by two separate VMs (or developers). This means that there might be a case when there is more than 1 instance of the same memoized solid running potentially creating conflict at “write”/materialisation time of the output.
In other systems these problems are resolved by having “global coordinator”, afaiu:
- Airflow has global dag bag
- Luigi you have the “global scheduler” process to resolve this conflict
- Prefect requires registration to global scheduler
To reproduce you can write a pipeline that uses MemoizableIOManager
and writes results to global location, like object store, and have two developers start it at the same time with the same config.
I would also suggest that this limitation should be well documented in the memoization documentation.
Message from the maintainers:
Impacted by this bug? Give it a 👍. We factor engagement into prioritization.
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (4 by maintainers)
Top GitHub Comments
Thank @ravwojdyla
@sryza see the high level doc here: https://luigi.readthedocs.io/en/stable/central_scheduler.html. Luigi itself is a relatively small amount of code, so the code itself might actually be the best documentation, see: https://github.com/spotify/luigi/blob/master/luigi/scheduler.py. Hope this is helpful.