E2E service tracing via SQS not working
See original GitHub issueI set up AWS Otel Agent version: 1.13.1-aws on a quarkus application using software.amazon.awssdk.services.sqs.SqsAsyncClient of AWS SDK 2.17.103 running in ECS container, getting trace segments exported by the AWS Otel Collector which are also visible in X-Ray like:
serviceA --send-> SQS serviceB --receive-> SQS
However, it seems to be not really possible to get the downstream included in the same trace like: serviceA -send-> SQS -receive-> serviceB
Seems to be highly related to this open/dead oldtimer: https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/3684 but even if it’s exactly the same problem would be nice to get a perspective, assuming that this is one of the most frequent usages of tracing in microservice architectures.
I checked the messages in the sqs queue and noticed that the attributes are empty, where I expected an AWS trace header to be set by auto-instrumentation. So I wonder now
- if this is a bug (https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/3684 ) or just not implemented
- if I am simply doing something wrong here
- if I have to take care for writing/reading the AWS trace header on sqs.send and sqs.receive (examples would be welcome, as I found only an example for aws sdk1 which is not compatible to aws sdk2: https://stackoverflow.com/questions/51954687/how-to-trace-a-request-through-an-sqs-queue-with-aws-x-ray )
Some further tech. context:
AWS Otel agent config
# service A
ENV OTEL_RESOURCE_ATTRIBUTES="service.name=serviceA"
# service B
ENV OTEL_RESOURCE_ATTRIBUTES="service.name=serviceB"
# service A+B
ENV JAVA_OPTIONS="
-Dquarkus.http.host=0.0.0.0
-Djava.util.logging.manager=org.jboss.logmanager.LogManager
-Dotel.propagators=tracecontext,baggage,xray
-Dotel.instrumentation.common.default-enabled=true
-Dotel.instrumentation.opentelemetry-annotations.enabled=true
-Dotel.traces.sampler=always_on
AWS Otel collector log
2022-06-10T12:57:40.665Z debug awsxrayexporter@v0.51.0/awsxray.go:66 request: {
TraceSegmentDocuments: [
[n subsegments]....
"{\"name\":\"Sqs\",\"id\":\"29bfd1ac08145c6d\",\"start_time\":1654865858.891311,\"origin\":\"AWS::ECS::Container\",\"trace_id\":\"1-62a33fbc-519ae399b605ac458103fb19\",\"end_time\":1654865860.5327685,\"http\":{\"request\":{\"method\":\"POST\",\"url\":\"https://sqs.eu-central-1.amazonaws.com?Action=SendMessage\\u0026Version=2012-11-05\\u0026QueueUrl=https%3A%2F%2Fsqs.eu-central-1.amazonaws.com%2F123456789123%2Fmyqueue.fifo\\u0026MessageBody=%7B%22fipsId%22%3A3295020%2C%22loggerTypeId%22%3A20%2C%22loggerCount%22%3A0%7D\\u0026MessageGroupId=defaultMessageGroup\",\"user_agent\":\"aws-sdk-java/2.17.103 Linux/4.14.276-211.499.amzn2.x86_64 OpenJDK_64-Bit_Server_VM/17.0.3+7-LTS Java/17.0.3 vendor/Red_Hat__Inc. exec-env/AWS_ECS_FARGATE io/async http/NettyNio cfg/retry-mode/legacy\"},\"response\":{\"status\":200,\"content_length\":0}},\"fault\":false,\"error\":false,\"throttle\":false,\"aws\":{\"ecs\":{\"container\":\"ip-10-3-118-186.eu-central-1.compute.internal\",\"container_id\":\"4c9eb21f0854800be118/c02a425851a64c9eb21f0854800be118-1682837531\"},\"xray\":{\"sdk\":\"opentelemetry for java\",\"sdk_version\":\"1.13.0\",\"auto_instrumentation\":true},\"operation\":\"SendMessage\",\"request_id\":\"bfc2f028-c367-512e-8f3a-ab3581635091\",\"queue_url\":\"https://sqs.eu-central-1.amazonaws.com/123456789123/myqueue.fifo\"},\"metadata\":{\"default\":{\"aws.agent\":\"java-aws-sdk\",\"http.flavor\":\"1.1\",\"net.transport\":\"ip_tcp\",\"rpc.service\":\"Sqs\",\"rpc.system\":\"aws-api\",\"thread.id\":86,\"thread.name\":\"executor-thread-9\"}},\"namespace\":\"aws\",\"parent_id\":\"a2653d250d74fb85\",\"type\":\"subsegment\"}\n",
[n subsegments]...
]
trace_id and parent_id are set and identical for all subsegments
Please let me know if I should provide more tech. details
Issue Analytics
- State:
- Created a year ago
- Comments:6 (3 by maintainers)
@eric-spence-code thanks for stepping up, I already implemented it yesterday (well, actually copied what aws sdk1 instrumentation was doing).
I re-tested this locally by two quarkus apps (set up like in https://quarkus.io/guides/opentelemetry) incl. java agent, a localstack sqs queue, and local jaeger-aio as tracing backend. However, sendMessage and receiveMessage and still not in the same trace. I will try to create a reproducer soon, but will likely debug the tests of this before creating the reproducer.