question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ECS execute_command session hangs when log-output is above certain length

See original GitHub issue

Describe the bug

I am trying to execute commands on ECS containers and have the logged output sent to CloudWatch (or S3). I have successfully configured everything (confirmed with CLI and SDK).
However, when using the SDK and the logged output is over a certain size (e.g. 10 full lines of text), the session from Session Manager hangs and needs to be terminated - this then sends the output to CloudWatch, however the logged-output is incomplete. It typically sends ~80% of the logged output. This is after waiting >10mins for the session to complete on its own. This same behavior is not experienced when using the CLI: with the CLI, the output is consistently returned in full (regardless of output size). The behavior is also not experienced when the logged-output is not configured to send to CloudWatch or S3.

Both the CLI and SDK are using the same configurations and are testing against the same containers in the same cluster with the same API Credentials.

Expected Behavior

When I use the code below and a cluster configured to send to CloudWatch/S3, the full logged-output should send to CloudWatch and the session (from Session Manager) should close automatically, identical to the behavior of aws ecs execute-command when using the CLI:

client = boto3.client('ecs')

response = client.execute_command(
    cluster='default',
    container='nginx',
    interactive=True,
    task='2e7c615feee94568b86049139f579137',
    command='tail -n 20 /var/log/amazon/ssm/amazon-ssm-agent.log')

Current Behavior

When the logged-output is over a certain “size” (e.g. 10 lines), the session does not complete on its own, it has to be manually terminated in Session Manager. When the session is [manually] terminated, the logged-output is incomplete.

Reproduction Steps

Create ECS Cluster with logging=OVERRIDE and

logConfiguration={ \
       CloudWatchLogGroupName=my_cloudwatch_log_group, \
       CloudWatchStreamingEnabled=true
}

This is so that you expect the logged output of the commands to be sent to a CloudWatch log stream.

Send commands to containers using the execute_command method from the SDK client. The command you send should expect a decent number of lines of logged-output. You can test with the CLI as well and see the difference in behavior.

Possible Solution

No response

Additional Information/Context

This has been tested on both the Python as well as Java SDK’s - identical behavior. CLI always works as intended. It has been tested against multiple container images. Tested on Mac and Linux.

SDK version used

1.17.106

Environment details (OS name and version, etc.)

Mac OS Monterey

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:1
  • Comments:11 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
tim-finnigancommented, May 2, 2022

I don’t have any updates on this yes but there is a related internal ticket that I will post for our reference. Also I’ll link the issue created in the amazon-ssm-agent repository: https://github.com/aws/amazon-ssm-agent/issues/443.

0reactions
tim-finnigancommented, Nov 16, 2022

We recently received an update from the service team that the length threshold was increased. This should be addressed if using the latest SSM agent version. Please let us know if you’re still running into any issues related to this.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshoot the error TargetNotConnectedException when ...
When I try to run the AWS Command Line Interface (AWS CLI) command execute-command in Amazon Elastic Container Service (Amazon ECS), ...
Read more >
1.1.7 (core) / 0.17.7 (libraries) - Dagster Docs
Fixed a bug where the re-execute button on runs of asset jobs would incorrectly show warning icon, indicating that the pipeline code may...
Read more >
amazon web services - `aws ecs execute-command` results in ...
Use the AWS CLI to start a session. An error occurred (TargetNotConnectedException) when calling the ExecuteCommand operation: The execute ...
Read more >
Blog 3 - Troubleshooting ECS Containers using ECS Exec
Amazon ECS Exec utilises AWS System Manager Session Manager to run ... or override to the configuration specified in the execute command ......
Read more >
Key ECS Metrics To monitor - Datadog
Understand both your ECS resource use and the status of ECS deployments ... EC2 container instances is given over to ECS-related processes.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found