Skip to content

feature: handle MSK events#1066

Draft
joeyzhao2018 wants to merge 8 commits intomainfrom
joey/handle_msk
Draft

feature: handle MSK events#1066
joeyzhao2018 wants to merge 8 commits intomainfrom
joey/handle_msk

Conversation

@joeyzhao2018
Copy link
Contributor

@joeyzhao2018 joeyzhao2018 commented Mar 5, 2026

https://datadoghq.atlassian.net/browse/SLES-2739

In Kafka's wire protocol (KIP-82), header values are always byte[]. Every Kafka client library enforces this:

Tracer Injection code Mechanism
dd-trace-java headers.add(key, value.getBytes(UTF_8)) String.getBytes() → byte[]
dd-trace-go Value: []byte(val) Go type conversion → []byte
dd-trace-dotnet _headers.Add(name, Encoding.UTF8.GetBytes(value)) UTF8.GetBytes() → byte[]

All three tracers accept string trace context values from the propagation layer, convert to UTF-8 bytes at the carrier adapter boundary, and hand byte[] to the Kafka client.
This isn't a quirk of Java's getBytes() — it's the only way Kafka headers work.

What MSK Lambda does

When MSK triggers a Lambda, AWS serializes the Kafka record to JSON. Since header values are byte[] on the wire, AWS represents them as JSON arrays of unsigned byte values —
regardless of which language produced the message:

"headers": [{"x-datadog-trace-id": [51, 54, 57, ...]}]

What's the difference between the msk_event.json and the newly added msk_event_with_headers.json here?

msk_event.json represents a standard MSK trigger where the producer didn't attach any Kafka headers — i.e. no Datadog tracer was running on the producer side (or it's a non-instrumented producer like a raw Kafka client, a Kinesis Firehose delivery stream, or a schema-registry message). In those cases Lambda still delivers the event but with "headers": []. It's also the format you get when testing MSK triggers manually in the AWS console, which doesn't inject headers.
The msk_event_with_headers.json case corresponds to a producer instrumented with a Datadog tracer (e.g. a Java/Dotnet/Go service) that injects trace context as Kafka headers — AWS then serializes those header values as byte arrays when delivering the event to Lambda.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant