question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

EcsRunLauncher tasks fail to start with CLI error

See original GitHub issue

Dagster version

dagster, version 1.0.6

What’s the issue?

When attempting to launch a run using the EcsRunLauncher class, ECS tasks are outputting an error from the injected command from Dagster. From the ECS console, Dagster is sending the command:

["dagster","api","execute_run","<large JSON string>"]

In the logs I see the Dagster CLI complaining about the input command:

2022-08-31 08:56:18 Usage: dagster [OPTIONS] COMMAND [ARGS]...
2022-08-31 08:56:18 CLI tools for working with Dagster.
2022-08-31 08:56:18 Options:
2022-08-31 08:56:18 -v, --version Show the version and exit.
2022-08-31 08:56:18 -h, --help Show this message and exit.
2022-08-31 08:56:18 Commands:
2022-08-31 08:56:18 asset Commands for working with Dagster assets.
2022-08-31 08:56:18 debug Commands for debugging Dagster job runs.
2022-08-31 08:56:18 instance Commands for working with the current Dagster instance.
...

If I copy and paste the large JSON string and run the command via the container locally with docker run <image> dagster api execute_run <large JSON string> it can at least start the task.

What did you expect to happen?

When the ECS task starts, I would expect the dagster run to at least be started and not error on parsing the command from the CLI

How to reproduce?

As part of my dagster.yaml I have the EcsRunLauncher defined:

run_launcher:
  module: "dagster_aws.ecs"
  class: "EcsRunLauncher"
  config:
    task_definition: <task_definition_arn>
    container_name: <task_container_name>
    include_sidecars: true

I have a simple job to keep a task busy for ~60s

import time

from dagster import graph, op, repository


@op
def my_op():
    start = time.time()
    i = 0
    while time.time() - start < 60:
        i = i + 1
        if i % 1000 == 0:
            print(i)
    return True


@graph
def my_graph():
    my_op()


my_job = my_graph.to_job()


@repository
def busy():
    return [my_job]

I can start the job from the UI and it can run fine using the default run launcher, but when switching to the EcsRunLauncher I am getting errors starting the job.

I have two other containers running dagit and the dagster-daemon in a ECS separate task

Deployment type

Other

Deployment details

This repo https://github.com/datarootsio/terraform-aws-ecs-dagster serves as the basis for our configuration currently which we are setting up to compare dagster against other job tools

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:10 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
clayton-cccommented, Sep 27, 2022

Entrypoint in the task definition seems to have been the culprit! I’ve ran into the issue before as well in a docker-specific context, which is why we tried running with both sh -c and /bin/bash -c, but I never thought to remove it 🤦

I think a note within this section about the entrypoint would be beneficial, as our standard terraform setup for ECS uses sh -c for all task definitions and a command syntax of /bin/bash -c \"${var.command}\"

1reaction
jmsanderscommented, Sep 27, 2022

Do you have a CMD or ENTRYPOINT defined in your task definition? You might be running into: https://aws.amazon.com/blogs/opensource/demystifying-entrypoint-cmd-docker/

So Dagster is indeed sending the command:

["dagster","api","execute_run","<large JSON string>"]

but ECS might then be array concatenating it with whatever is in CMD or ENTRYPOINT in your task definition to instead run:

["<another command>", "dagster", "api", "execute_run", <large JSON string>"]

At least when we’ve seen similar issues in the past, that has been the culprit.

I’m a little hesitant to change the default command behavior for the launcher without knowing if this is specific to your custom task definition but I’m curious to know what your task definition looks like so we can either provide better docs/error messages or make the launcher compatible with this kind of customization.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshoot the error TargetNotConnectedException when ...
When I try to run the AWS Command Line Interface (AWS CLI) command ... operation: The execute command failed due to an internal...
Read more >
Deploying Dagster to AWS
Launching Runs in ECS#. The EcsRunLauncher launches an ECS task per run. It assumes that the rest of your Dagster deployment is also...
Read more >
How to diagnose ECS Fargate task failing to start?
Please go Clusters > Tasks > Details > Containers. You could see some error message around the red rectangle in the figure "error...
Read more >
dagit: Versions | Openbase
This fixed then error when load_assets_from_dbt_manifest failed to load from dbt ... Added a new CLI command dagster run migrate-repository which lets you ......
Read more >
dagster-io/dagster 0.14.4 on GitHub - NewReleases.io
[dagster-aws] The EcsRunLauncher now raises the underlying ECS API failure if it cannot successfully start a task. Software-Defined Assets. When loading assets ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found