Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ECS execute_command session hangs when log-output is above certain length

See original GitHub issue

Describe the bug

I am trying to execute commands on ECS containers and have the logged output sent to CloudWatch (or S3). I have successfully configured everything (confirmed with CLI and SDK).
However, when using the SDK and the logged output is over a certain size (e.g. 10 full lines of text), the session from Session Manager hangs and needs to be terminated - this then sends the output to CloudWatch, however the logged-output is incomplete. It typically sends ~80% of the logged output. This is after waiting >10mins for the session to complete on its own. This same behavior is not experienced when using the CLI: with the CLI, the output is consistently returned in full (regardless of output size). The behavior is also not experienced when the logged-output is not configured to send to CloudWatch or S3.

Both the CLI and SDK are using the same configurations and are testing against the same containers in the same cluster with the same API Credentials.

Expected Behavior

When I use the code below and a cluster configured to send to CloudWatch/S3, the full logged-output should send to CloudWatch and the session (from Session Manager) should close automatically, identical to the behavior of aws ecs execute-command when using the CLI:

client = boto3.client('ecs')

response = client.execute_command(
    cluster='default',
    container='nginx',
    interactive=True,
    task='2e7c615feee94568b86049139f579137',
    command='tail -n 20 /var/log/amazon/ssm/amazon-ssm-agent.log')

Current Behavior

When the logged-output is over a certain “size” (e.g. 10 lines), the session does not complete on its own, it has to be manually terminated in Session Manager. When the session is [manually] terminated, the logged-output is incomplete.

Reproduction Steps

Create ECS Cluster with logging=OVERRIDE and

logConfiguration={ \
       CloudWatchLogGroupName=my_cloudwatch_log_group, \
       CloudWatchStreamingEnabled=true
}

This is so that you expect the logged output of the commands to be sent to a CloudWatch log stream.

Send commands to containers using the execute_command method from the SDK client. The command you send should expect a decent number of lines of logged-output. You can test with the CLI as well and see the difference in behavior.

Possible Solution

No response

Additional Information/Context

This has been tested on both the Python as well as Java SDK’s - identical behavior. CLI always works as intended. It has been tested against multiple container images. Tested on Mac and Linux.

SDK version used

1.17.106

Environment details (OS name and version, etc.)

Mac OS Monterey

Issue Analytics

State:
Created a year ago
Reactions:1
Comments:11 (6 by maintainers)

Top GitHub Comments

1reaction

tim-finnigancommented, May 2, 2022

I don’t have any updates on this yes but there is a related internal ticket that I will post for our reference. Also I’ll link the issue created in the amazon-ssm-agent repository: https://github.com/aws/amazon-ssm-agent/issues/443.

0reactions

tim-finnigancommented, Nov 16, 2022

We recently received an update from the service team that the length threshold was increased. This should be addressed if using the latest SSM agent version. Please let us know if you’re still running into any issues related to this.