Skip to content

feat(sqs): add OpenTelemetry instrumentation for SQS client and worker#48

Open
anson-lee-sl wants to merge 6 commits into
masterfrom
feature/otel-sqs
Open

feat(sqs): add OpenTelemetry instrumentation for SQS client and worker#48
anson-lee-sl wants to merge 6 commits into
masterfrom
feature/otel-sqs

Conversation

@anson-lee-sl

@anson-lee-sl anson-lee-sl commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Summary

Add OpenTelemetry instrumentation for AWS SQS client and worker, following
OTel Semantic Conventions 1.41.0 — messaging spans and AWS SQS.

Instrumented operations

SQS Client (plugins/sqs/otel_client.go)

Operation SpanKind messaging.operation.type messaging.operation.name
SendMessage Producer send send
SendMessageBatch Producer send send_batch
ReceiveMessage Consumer receive receive
DeleteMessage Client settle delete
DeleteMessageBatch Client settle delete_batch
ChangeMessageVisibility Client settle change_visibility
ChangeMessageVisibilityBatch Client settle change_visibility_batch

SQS Worker (plugins/sqs_worker/otel_consumer.go)

  • SpanKindConsumer process span per message
  • Parented to upstream producer trace via W3C traceparent extracted from MessageAttributes
  • Subsequent DeleteMessage captured as child settle span

Trace context propagation

W3C traceparent, tracestate, and baggage are injected into outgoing message
MessageAttributes for SendMessage/SendMessageBatch, and extracted on the
consumer side. addReceiveLinks attaches upstream trace contexts as span links
on ReceiveMessage spans.

Files changed

plugins/sqs/
otel_carrier.go — SQSMessageCarrier adapts SQS attrs to TextMapCarrier
otel_client.go — SQS client handler hooks (Validate + Complete)
otel_client_noop.go — no-op when otel build tag is absent
otel_client_test.go — unit tests for client instrumentation
topic.go — auto-instrument SQS client in AddTopic
README.md — documentation
plugins/sqs_worker/
otel_consumer.go — process hook: extract context + start process span
otel_consumer_test.go — unit tests for consumer instrumentation
worker.go — processHook extension point (no-op default)
README.md — documentation

Build tags

-tags "sqs,otel" # SQS client instrumentation
-tags "sqs,sqs_worker,otel" # SQS client + worker instrumentation

Test Cases

Deployed fake-sqs-producer and fake-sqs-consumer to local orbstack cluster.
All 8 span types confirmed emitting in Tempo.

Operation Trace ID Service
send 89550dc2f00b34dc39ae91b771145bd9 cross-service
send_batch 8f2a9c5b99ef38cedb8bdb4b131c26e1 fake-sqs-producer
receive 5039a668ce0bae15fc69005f13150b fake-sqs-consumer
delete + process c45c9188dd5aca12b9690b87d5f4656c cross-service
delete_batch 703bd4d86dea75462db2a82bdef5dd19 (producer) / 8b4d1eb7854d241e2ee538cb77b3f55 (consumer) both
change_visibility 48ddbf96cb267d73ed149b75d8c4388c (producer) / c1238fc61cb0b195a936ddafbb71991b (consumer) both
change_visibility_batch 60e2107e99c510b418e5021c11869353 (producer) / c2ff3e2e9ee19ab5d513fe49d8b57932 (consumer) both
image image image image image image image image image image image image image

Following OpenTelemetry Semantic Conventions 1.41.0:
- Semantic conventions for AWS SQS
- Semantic conventions for messaging spans
@anson-lee-sl anson-lee-sl marked this pull request as ready for review June 3, 2026 06:03
@ThomasKwan-shopline

Copy link
Copy Markdown

You should update makefile to test sqs
pipeline link: https://github.com/shoplineapp/go-app/actions/runs/26862804110/job/79219956996
image

}

func startSpan(r *request.Request) {
class, ok := classifyOperation(r.Operation.Name)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the class variable maybe call operation is better

Comment thread plugins/sqs/otel_client.go Outdated
if queueName != "" {
attrs = append(attrs, attribute.String("messaging.destination.name", queueName))
}
if addr := ServerAddressFromURL(queueURL); addr != "" {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe merge ServerAddressFromURL and ServerPortFromURL into one function, because the implementations of the functions are similar

Comment thread plugins/sqs/otel_client.go
Comment thread plugins/sqs/otel_client.go
func injectTraceContext(r *request.Request) {
propagator := otel.GetTextMapPropagator()
switch p := r.Params.(type) {
case *aws_sqs.SendMessageInput:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

點解淨係得send先要inject trace context, to update the sqs message with trace id?

@anson-lee-sl anson-lee-sl Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. When receiving, we will be using addReceiveLinks to extract the trace id from the message. So

  • Before send, we inject a trace id.
  • Before processing the message in consumer, we extract the trace id from the message

@ThomasKwan-shopline ThomasKwan-shopline Jun 12, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before processing the message, we extract the message

漏左?

@anson-lee-sl anson-lee-sl Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to inject a trace id when consuming a message.

Comment thread plugins/sqs_worker/otel_consumer.go
Comment thread plugins/grpc/presets/grpc_with_error_reporting.go
Comment thread plugins/sqs_worker/worker.go Outdated
Comment thread plugins/sqs_worker/worker.go Outdated
Comment thread plugins/sqs/otel_client.go
The order swap of 'sentry,otel' <-> 'otel,sentry' in the legacy // +build line was a drive-by in 1ab9cbf (ci: test sqs, sqs_worker) with no functional effect: commas are AND-joined and order is commutative, and the modern //go:build line was untouched. Restore the pre-1ab9cbf order so the file matches master.
Both helpers shared the same url.Parse + error-handling skeleton, and the two span-start sites in startSpan (otel_client.go) and startProcessSpan (otel_consumer.go) called them sequentially on the same queueURL. Replace them with HostPortFromURL(queueURL) (host, port int); call sites destructure and only record server.port when server.address is recorded. Test gains a port-bearing URL case so the port branch is exercised.
// +build uses comma-AND with right-associative nesting, so swapping the order of otel and sentry in the legacy line changes the AST even though the predicate is equivalent. go vet (run by go test) compares the two constraint ASTs and rejects the mismatch. 1ab9cbf aligned them when it added otel to the test tags; restoring the alignment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants