Pipeline concurrency control
See original GitHub issueIt would be nice if Dagster could allow controlling the max concurrent number of runs a pipeline can have. Something like:
@pipeline(max_concurrent_runs=3)
def my_pipeline():
We have a pipeline solid that ingests a feed of Thing
items from a source and starts a child pipeline for each one. We need to do this because we want to visualize, control, and rate limit the processing of each Thing
independently, but more importantly because we want to process each Thing
as it’s ingested as opposed to waiting until our solid collects them all and outputs them as a List. We therefore needed to find a way to “pause” our generator solid while it waits for some running pipelines to finish.
This feature might not be too hard to add because I’ve successfully implemented it in my client code:
def wait_for_run_slot(pipeline_name, max_runs=1):
instance = DagsterInstance.get()
def get_active_runs(pipeline_name):
started_runs = len(instance.get_runs(PipelineRunsFilter(pipeline_name=pipeline_name, status=PipelineRunStatus.STARTED)))
not_started_runs = len(instance.get_runs(PipelineRunsFilter(pipeline_name=pipeline_name, status=PipelineRunStatus.NOT_STARTED)))
return started_runs + not_started_runs
while get_active_runs(pipeline_name) >= max_runs:
time.sleep(2)
@solid
def generator(context):
batch_of_things = ...
for thing in batch_of_things:
wait_for_run_slot('process_thing')
...launch pipeline...
It would be great if this extra machinery could be avoided with a single argument to @pipeline
. Also, the above code only prevents the generator solid from launching too many runs, but you can still override this limit by launching manually via the web UI. Whereas if the setting is configured into the pipeline definition via @pipeline
, it could be enforced by dagster internally at a global level.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:5 (3 by maintainers)
Top GitHub Comments
@johannkm FYI I’m moving this to Scheduled Runs project board, feel free to dupe this one to a primary issue for run queuing if you have a different one
@dbendelman we’re currently working on adding run queueing to Dagster which I think should help here; feel free to jump in our Slack and DM Johann if you have any questions!
Tag-based concurrency limits are now on master. These will be documented for the 0.10.0 release, but for now they can be seen at https://github.com/dagster-io/dagster/blob/master/python_modules/dagster/dagster/core/run_coordinator/queued_run_coordinator.py#L52-L64.
To achieve the functionality you have above, you can place a tag in the PipelineDefinition, then specify a limit for that tag on the Run Coordinator. I think that satisfies this feature request so I’m closing the issue. Feel free to re-open if that’s not the case, or reach out for any questions/comments!