feat(contexts): Add TraceId by default#5759
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Random TraceContext not added when contexts is None
normalize_contextsnow initializesAnnotated<Contexts>withContexts::newbeforeprocessor::apply, ensuring a randomTraceContextis added even when contexts were missing.
Or push these changes by commenting:
@cursor push 42352a05a6
Preview (42352a05a6)
diff --git a/relay-event-normalization/src/event.rs b/relay-event-normalization/src/event.rs
--- a/relay-event-normalization/src/event.rs
+++ b/relay-event-normalization/src/event.rs
@@ -1308,6 +1308,7 @@
/// Normalizes incoming contexts for the downstream metric extraction.
fn normalize_contexts(contexts: &mut Annotated<Contexts>) {
+ contexts.get_or_insert_with(Contexts::new);
let _ = processor::apply(contexts, |contexts, _meta| {
// Reprocessing context sent from SDKs must not be accepted, it is a Sentry-internal
// construct.This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
95690de to
3579819
Compare
0fefa72 to
64d1970
Compare
| impl TraceContext { | ||
| /// Generates a random [`TraceId`] and random [`SpanId`]. | ||
| /// Leaves all other fields blank. | ||
| pub fn random() -> Self { | ||
| let mut trace_meta = Meta::default(); | ||
| trace_meta.add_remark(Remark::new(RemarkType::Substituted, "trace_id.missing")); | ||
|
|
||
| let mut span_meta = Meta::default(); | ||
| span_meta.add_remark(Remark::new(RemarkType::Substituted, "span_id.missing")); | ||
| TraceContext { | ||
| trace_id: Annotated(Some(TraceId::random()), trace_meta), | ||
| span_id: Annotated(Some(SpanId::random()), span_meta), | ||
| ..Default::default() | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
This is one of two non-test changes.
| // We need a TraceId to ingest the event into EAP. | ||
| // If the event lacks a TraceContext, add a random one. | ||
| if !contexts.contains::<TraceContext>() { | ||
| contexts.add(TraceContext::random()) | ||
| } | ||
|
|
There was a problem hiding this comment.
This is one of two non-test changes.
a159791 to
eae2186
Compare
eae2186 to
e30157b
Compare
5406ae5 to
75a2d8a
Compare
75a2d8a to
ba308fb
Compare
785348e to
9c7ebde
Compare
| trace_meta.add_remark(Remark::new(RemarkType::Substituted, "trace_id.missing")); | ||
|
|
||
| let mut span_meta = Meta::default(); | ||
| span_meta.add_remark(Remark::new(RemarkType::Substituted, "span_id.missing")); |
There was a problem hiding this comment.
How will these remarks show up in the UI if we merge this right now? Might be worth trying with a local relay hooked up to production sentry.io before merging (see https://github.com/getsentry/relay/?tab=readme-ov-file#building-and-running). I'm worried that UI annotates this as some kind of processing error, and then users see this on almost every error event.
An alternative would be to not set a remark at all, and rather set the origin field of the trace context to something like "relay".
There was a problem hiding this comment.
The UI doesn't show meta at all right now from the spans dataset. It's a task for a project upcoming shortly (attribute explorer).
There was a problem hiding this comment.
The UI doesn't show meta at all right now from the spans dataset
But it does for the errors data set, right? I.e. JSON ends up in nodestore, and the views in Issues will render at least some of the _meta.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Autofix Details
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: SpanId byte order inconsistency between
randomandfrom_strSpanId::random()now usesto_be_bytes()so its internal byte order matchesSpanId::from_str()and remains consistent across platforms.
Or push these changes by commenting:
@cursor push 73d35bc862
Preview (73d35bc862)
diff --git a/relay-event-schema/src/protocol/contexts/trace.rs b/relay-event-schema/src/protocol/contexts/trace.rs
--- a/relay-event-schema/src/protocol/contexts/trace.rs
+++ b/relay-event-schema/src/protocol/contexts/trace.rs
@@ -172,7 +172,7 @@
impl SpanId {
pub fn random() -> Self {
let value: u64 = rand::random_range(1..=u64::MAX);
- Self(value.to_ne_bytes())
+ Self(value.to_be_bytes())
}
}This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
| pub fn random() -> Self { | ||
| let value: u64 = rand::random_range(1..=u64::MAX); | ||
| Self(value.to_ne_bytes()) | ||
| } |
There was a problem hiding this comment.
SpanId byte order inconsistency between random and from_str
Low Severity
SpanId::random() stores the u64 using to_ne_bytes() (native endian), but SpanId::from_str() stores it using to_be_bytes() (big endian). On little-endian platforms (virtually all modern servers), these produce different byte orderings for the same u64 value. While the Display/from_str round-trip happens to preserve bytes correctly, this inconsistency means randomly-generated SpanIds follow a different internal byte convention than parsed ones. Any future code that interprets the inner [u8; 8] as a u64 via from_be_bytes (the convention implied by from_str) would get wrong results for random SpanIds. Using to_be_bytes() here would maintain consistency.
jjbayer
left a comment
There was a problem hiding this comment.
Code looks OK to me but I would be more comfortable if we could feature flag the auto-generation (see code linked below). Then we can dogfood it, see what it does to average payload sizes (e.g. in S4S) and check what the existing UI looks like.
relay/relay-dynamic-config/src/feature.rs
Lines 14 to 16 in 8c0735f
ff8697f to
7d666ae
Compare
|
Added a feature flag; PR adding it to flagpole up in https://github.com/getsentry/sentry-options-automator/pull/7064 |
7d666ae to
eeb9e61
Compare
| impl SpanId { | ||
| pub fn random() -> Self { | ||
| let value: u64 = rand::random_range(1..=u64::MAX); | ||
| Self(value.to_ne_bytes()) |
There was a problem hiding this comment.
Bug: SpanId::random() uses native-endian bytes (to_ne_bytes), but serialization/deserialization (Display/FromStr) expect big-endian. This causes incorrect roundtripping on little-endian systems.
Severity: MEDIUM
Suggested Fix
In SpanId::random(), change the call from rng.gen::<u64>().to_ne_bytes() to rng.gen::<u64>().to_be_bytes(). This will align the byte order during generation with the big-endian order expected by the Display and FromStr implementations, ensuring correct serialization and deserialization roundtrips.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: relay-event-schema/src/protocol/contexts/trace.rs#L174
Potential issue: A byte-ordering inconsistency exists in `SpanId` handling. The
`SpanId::random()` function generates an ID using native-endian byte order
(`to_ne_bytes()`), while the `Display` and `FromStr` implementations, used for
serialization and deserialization, assume big-endian byte order. On common little-endian
architectures like x86_64, this causes a randomly generated `SpanId` to fail a
serialization-deserialization roundtrip. For example, a generated ID will be displayed
with its bytes reversed, and parsing that string back will result in a different ID
value. This can lead to data integrity issues and broken trace continuity if these
randomly generated IDs are ever stored and re-ingested.
|
I'll go through and revert the test changes that are causing the failures once I get an OK on the new direction / use of Feature. |
jjbayer
left a comment
There was a problem hiding this comment.
Feature usage looks good to me!
relay-cabi/src/processing.rs
Outdated
| span_allowed_hosts: &[], // only supported in relay | ||
| span_op_defaults: Default::default(), // only supported in relay | ||
| performance_issues_spans: Default::default(), | ||
| should_add_trace_id_by_default: Default::default(), |
There was a problem hiding this comment.
nit: I would give this an imperative name, e.g. derive_trace_id. See remove_other, enrich_spans, emit_event_errors.
relay-dynamic-config/src/feature.rs
Outdated
| #[serde(rename = "projects:relay-playstation-uploads")] | ||
| PlaystationUploads, | ||
| /// Add a random trace ID to events that lack one. | ||
| #[serde(rename = "organizations:add-default-trace-id-relay")] |
There was a problem hiding this comment.
nit: relay- is usually a prefix (see feature above this one).
We are beginning to store events in EAP, which — as a Trace-centric datastore — requires all TraceItems to be associated with a TraceId. This is problematic, since we currently don't require that events actually have TraceIds. That means that we're currently just silently dropping a bunch of events before we can successfully ingest them into EAP. This PR adds a random TraceId & SpanId to events that do not have a TraceContext.
eeb9e61 to
43afc56
Compare
| pub fn random() -> Self { | ||
| let value: u64 = rand::random_range(1..=u64::MAX); | ||
| Self(value.to_ne_bytes()) | ||
| } |
There was a problem hiding this comment.
Bug: SpanId::random() uses native-endian bytes (to_ne_bytes), while parsing assumes big-endian, causing incorrect serialization and parsing on little-endian systems.
Severity: HIGH
Suggested Fix
Ensure consistent endianness across SpanId creation, serialization, and parsing. Modify SpanId::random() to use to_be_bytes() instead of to_ne_bytes(). This will align the byte order of randomly generated IDs with the expectations of the FromStr and Display implementations, guaranteeing correct behavior on all architectures.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: relay-event-schema/src/protocol/contexts/trace.rs#L172-L175
Potential issue: The `SpanId::random()` function generates a `u64` and converts it to
bytes using `to_ne_bytes()`. On little-endian systems, which include most modern server
architectures, this produces a little-endian byte array. However, the `Display`
implementation serializes these bytes in their storage order, creating a byte-reversed
hex string. When this string is later parsed by `FromStr`, it is interpreted as a
big-endian value, resulting in a different `SpanId` value than the one originally
generated. This will cause incorrect span IDs to be sent to downstream systems that
expect standard big-endian hex encoding.
e5be3e3 to
9927ecc
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Method named random but trace_id is deterministic
- Renamed
TraceContext::random(event_id)toTraceContext::from_event_id(event_id)and updated call sites/tests/comments to reflect deterministic trace ID derivation.
- Renamed
Or push these changes by commenting:
@cursor push eb60b26a0d
Preview (eb60b26a0d)
diff --git a/relay-event-normalization/src/event.rs b/relay-event-normalization/src/event.rs
--- a/relay-event-normalization/src/event.rs
+++ b/relay-event-normalization/src/event.rs
@@ -1331,10 +1331,10 @@
contexts.0.remove("reprocessing");
// We need a TraceId to ingest the event into EAP.
- // If the event lacks a TraceContext, add a random one.
+ // If the event lacks a TraceContext, derive one from the event id.
if config.derive_trace_id && !contexts.contains::<TraceContext>() {
- contexts.add(TraceContext::random(event_id))
+ contexts.add(TraceContext::from_event_id(event_id))
}
for annotated in &mut contexts.0.values_mut() {
diff --git a/relay-event-schema/src/protocol/contexts/trace.rs b/relay-event-schema/src/protocol/contexts/trace.rs
--- a/relay-event-schema/src/protocol/contexts/trace.rs
+++ b/relay-event-schema/src/protocol/contexts/trace.rs
@@ -332,9 +332,9 @@
}
impl TraceContext {
- /// Generates a random [`SpanId`] and takes `[TraceId]` from the event's UUID.
+ /// Generates a random [`SpanId`] and derives [`TraceId`] from the event's UUID.
/// Leaves all other fields blank.
- pub fn random(event_id: Uuid) -> Self {
+ pub fn from_event_id(event_id: Uuid) -> Self {
let mut trace_meta = Meta::default();
trace_meta.add_remark(Remark::new(RemarkType::Substituted, "trace_id.missing"));
@@ -641,8 +641,8 @@
}
#[test]
- fn test_random_trace_context() {
- let rand_context = TraceContext::random(Uuid::new_v4());
+ fn test_trace_context_from_event_id() {
+ let rand_context = TraceContext::from_event_id(Uuid::new_v4());
assert!(rand_context.trace_id.value().is_some());
assert_eq!(
rand_contextThis Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 9927ecc. Configure here.
| impl TraceContext { | ||
| /// Generates a random [`SpanId`] and takes `[TraceId]` from the event's UUID. | ||
| /// Leaves all other fields blank. | ||
| pub fn random(event_id: Uuid) -> Self { |
There was a problem hiding this comment.
Method named random but trace_id is deterministic
Low Severity
TraceContext::random(event_id) derives trace_id deterministically from the event UUID, not randomly. Only the span_id is actually random. The method was originally fully random (first iteration of the PR) but was changed to derive from event_id without updating the name. This is misleading — callers seeing TraceContext::random(event_id) would reasonably assume the output is fully random, not that it embeds the event UUID as the trace ID.
Reviewed by Cursor Bugbot for commit 9927ecc. Configure here.
| impl TraceContext { | ||
| /// Generates a random [`SpanId`] and takes `[TraceId]` from the event's UUID. | ||
| /// Leaves all other fields blank. | ||
| pub fn random(event_id: Uuid) -> Self { |
| fn test_normalize_adds_trace_id() { | ||
| let json = r#" | ||
| { | ||
| "type": "transaction", |
There was a problem hiding this comment.
Do we actually want this normalization to also apply to transactions? If not, we should probably filter on the event type in normalize_contexts.



We are beginning to store events in EAP, which — as a Trace-centric datastore — requires all TraceItems to be associated with a TraceId. This is problematic, since we currently don't require that events actually have TraceIds. That means that we're currently just silently dropping a bunch of events before we can successfully ingest them into EAP.
This PR adds a random TraceId & SpanId to events that do not have a TraceContext.