question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pipeline concurrency control

See original GitHub issue

It would be nice if Dagster could allow controlling the max concurrent number of runs a pipeline can have. Something like:

@pipeline(max_concurrent_runs=3)
def my_pipeline():

We have a pipeline solid that ingests a feed of Thing items from a source and starts a child pipeline for each one. We need to do this because we want to visualize, control, and rate limit the processing of each Thing independently, but more importantly because we want to process each Thing as it’s ingested as opposed to waiting until our solid collects them all and outputs them as a List. We therefore needed to find a way to “pause” our generator solid while it waits for some running pipelines to finish.

This feature might not be too hard to add because I’ve successfully implemented it in my client code:

def wait_for_run_slot(pipeline_name, max_runs=1):

    instance = DagsterInstance.get()

    def get_active_runs(pipeline_name):
        started_runs = len(instance.get_runs(PipelineRunsFilter(pipeline_name=pipeline_name, status=PipelineRunStatus.STARTED)))
        not_started_runs = len(instance.get_runs(PipelineRunsFilter(pipeline_name=pipeline_name, status=PipelineRunStatus.NOT_STARTED)))
        return started_runs + not_started_runs

    while get_active_runs(pipeline_name) >= max_runs:
        time.sleep(2)

@solid
def generator(context):
    batch_of_things = ...
    for thing in batch_of_things:
        wait_for_run_slot('process_thing')
        ...launch pipeline...

It would be great if this extra machinery could be avoided with a single argument to @pipeline. Also, the above code only prevents the generator solid from launching too many runs, but you can still override this limit by launching manually via the web UI. Whereas if the setting is configured into the pipeline definition via @pipeline, it could be enforced by dagster internally at a global level.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
natekuppcommented, Nov 11, 2020

@johannkm FYI I’m moving this to Scheduled Runs project board, feel free to dupe this one to a primary issue for run queuing if you have a different one

@dbendelman we’re currently working on adding run queueing to Dagster which I think should help here; feel free to jump in our Slack and DM Johann if you have any questions!

1reaction
johannkmcommented, Dec 15, 2020

Tag-based concurrency limits are now on master. These will be documented for the 0.10.0 release, but for now they can be seen at https://github.com/dagster-io/dagster/blob/master/python_modules/dagster/dagster/core/run_coordinator/queued_run_coordinator.py#L52-L64.

To achieve the functionality you have above, you can place a tag in the PipelineDefinition, then specify a limit for that tag on the Run Coordinator. I think that satisfies this feature request so I’m closing the issue. Feel free to re-open if that’s not the case, or reach out for any questions/comments!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Deployment concurrency control - Bitbucket - Atlassian Support
Track your deployments using Bitbucket Deployments, and Pipelines will automatically check if there is a deployment in progress before starting a new one....
Read more >
Amazon SageMaker Pipelines now supports concurrency ...
Concurrency control helps customers to control the number of pipeline steps that can execute in parallel. With the new feature, customers can ...
Read more >
Introducing automatic concurrency control for Bitbucket ...
So we're excited to announce that Bitbucket Deployments is the first CI/CD tool to automatically enforce deployment concurrency control.
Read more >
Controlling Concurrency | Buildkite Documentation
Although concurrency groups are created on individual steps, they represent concurrent access to shared resources and can be used by other pipelines. A ......
Read more >
Concurrency and parallelism | Docs - Buddy.Works
Pipelines concurrency defines how many pipelines can run simultaneously. When more pipelines than your plan allows are running, the subsequent executions ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found