feat(openfeature): subscribe to FFE_FLAGS during tracer RC setup#4495
feat(openfeature): subscribe to FFE_FLAGS during tracer RC setup#4495leoromanovsky wants to merge 15 commits intomainfrom
Conversation
|
✅ Tests 🎉 All green!❄️ No new flaky tests detected 🎯 Code Coverage (details) 🔗 Commit SHA: b95c6a4 | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback! |
Codecov Report❌ Patch coverage is Additional details and impacted files
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR reduces OpenFeature flag readiness latency by ensuring the FFE_FLAGS Remote Config product is subscribed during tracer Remote Config initialization (so it’s included in the first poll), while allowing the later-created DatadogProvider to attach to that early subscription and replay the latest buffered snapshot.
Changes:
- Add a tracer-time RC subscription for
FFE_FLAGS, with an internal forwarding callback that buffers the latest config snapshot until the provider attaches. - Add provider-side “fast path” attach/replay logic; keep “slow path” behavior for cases where no tracer RC subscription exists.
- Promote the previously hardcoded FFE capability (
46) intointernal/remoteconfigasFFEFlagEvaluationand update module sums/mods accordingly.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| openfeature/remoteconfig.go | Adds attach fast-path; switches to remoteconfig.FFEFlagEvaluation; updates unregister logic. |
| openfeature/rc_subscription.go | Adds a thin wrapper to attach provider callback to the internal subscription bridge. |
| openfeature/rc_subscription_test.go | Adds tests for attach/replay behavior and fast-path initialization. |
| internal/remoteconfig/remoteconfig.go | Adds FFEFlagEvaluation capability constant to the capability iota block. |
| internal/openfeature/rc_subscription.go | Implements tracer-time subscription + forwarding/buffering + attach/replay bridge. |
| internal/openfeature/rc_subscription_test.go | Adds unit tests for buffering/forwarding and attach behavior. |
| internal/openfeature/testing.go | Adds test helpers to manipulate internal subscription state. |
| ddtrace/tracer/remote_config.go | Subscribes to FFE_FLAGS during tracer RC setup when env var is enabled. |
| ddtrace/tracer/tracer_test.go | Adds a goleak ignore entry for an OpenFeature SDK goroutine. |
| instrumentation/internal/namingschematest/go.sum | Dependency checksum updates. |
| go.work.sum | Workspace checksum updates. |
| contrib/confluentinc/confluent-kafka-go/kafka.v2/go.mod | Adds/updates indirect deps (e.g., pflag, x/crypto, x/oauth2). |
| contrib/confluentinc/confluent-kafka-go/kafka.v2/go.sum | Dependency checksum updates. |
| contrib/bradfitz/gomemcache/go.mod | Updates indirect golang.org/x/crypto. |
| contrib/bradfitz/gomemcache/go.sum | Dependency checksum updates. |
| contrib/aws/aws-sdk-go/go.mod | Updates indirect golang.org/x/crypto. |
| contrib/aws/aws-sdk-go/go.sum | Dependency checksum updates. |
| contrib/aws/aws-sdk-go-v2/go.mod | Updates indirect golang.org/x/crypto. |
| contrib/aws/aws-sdk-go-v2/go.sum | Dependency checksum updates. |
| contrib/99designs/gqlgen/go.mod | Updates indirect golang.org/x/crypto. |
| contrib/99designs/gqlgen/go.sum | Dependency checksum updates. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func forwardingCallback(update remoteconfig.ProductUpdate) map[string]rc.ApplyStatus { | ||
| rcState.Lock() | ||
| defer rcState.Unlock() | ||
|
|
||
| if rcState.callback != nil { | ||
| return rcState.callback(update) | ||
| } |
There was a problem hiding this comment.
In forwardingCallback, rcState is held while invoking the provider callback. Calling external code under a mutex can block RC processing and risks deadlocks if the callback ever calls back into this package. Consider copying rcState.callback to a local variable under lock, releasing the lock, then invoking it (and similarly for the buffering path).
There was a problem hiding this comment.
Disagree with the deadlock risk. The callback chain is: forwardingCallback (holds rcState.Lock) → rcCallback → updateConfiguration (acquires p.mu.Lock)
For a deadlock, we'd need another code path that acquires p.mu first, then tries to acquire rcState.Lock. This doesn't exist - rcCallback and updateConfiguration never call back into internal/openfeature.
The suggestion to release the lock early introduces risks. If we release rcState.Lock before invoking the callback, a second RC update could arrive and be processed concurrently or out-of-order. The current implementation ensures serial processing of updates, which is important since each RC update is a full config snapshot.
| defer rcState.Unlock() | ||
|
|
||
| if !rcState.subscribed { | ||
| return false | ||
| } | ||
|
|
||
| rcState.callback = cb | ||
|
|
||
| // Replay the buffered update if there is one. | ||
| if rcState.buffered != nil { | ||
| log.Debug("openfeature: replaying buffered RC config to provider") | ||
| cb(*rcState.buffered) | ||
| rcState.buffered = nil |
There was a problem hiding this comment.
AttachCallback replays the buffered update by calling cb while still holding rcState.Lock(). As with forwardingCallback, invoking callbacks under a mutex can lead to lock contention or future deadlocks. Suggest capturing the buffered update (and clearing it) under lock, then unlocking before calling cb.
| defer rcState.Unlock() | |
| if !rcState.subscribed { | |
| return false | |
| } | |
| rcState.callback = cb | |
| // Replay the buffered update if there is one. | |
| if rcState.buffered != nil { | |
| log.Debug("openfeature: replaying buffered RC config to provider") | |
| cb(*rcState.buffered) | |
| rcState.buffered = nil | |
| if !rcState.subscribed { | |
| rcState.Unlock() | |
| return false | |
| } | |
| rcState.callback = cb | |
| // Capture and clear the buffered update under lock, then replay it after | |
| // releasing the mutex to avoid invoking callbacks while holding rcState.Lock(). | |
| buffered := rcState.buffered | |
| rcState.buffered = nil | |
| rcState.Unlock() | |
| // Replay the buffered update if there is one. | |
| if buffered != nil { | |
| log.Debug("openfeature: replaying buffered RC config to provider") | |
| cb(*buffered) |
There was a problem hiding this comment.
Same as #4495 (comment) - no deadlock risk exists here.
Also, releasing the lock early in AttachCallback is problematic: if we unlock after setting rcState.callback = cb but before calling cb(*rcState.buffered), a new RC update could arrive and forwardingCallback would forward it to cb concurrently with the replay. This could cause race conditions in the provider.
Holding the lock ensures the replay completes atomically before any new updates are processed.
| // ResetForTest resets the global rcState for test isolation. | ||
| func ResetForTest() { | ||
| rcState.Lock() | ||
| defer rcState.Unlock() | ||
| rcState.subscribed = false | ||
| rcState.callback = nil | ||
| rcState.buffered = nil | ||
| } | ||
|
|
||
| // SetSubscribedForTest sets the subscribed flag without actually calling Subscribe. | ||
| func SetSubscribedForTest(v bool) { | ||
| rcState.Lock() | ||
| defer rcState.Unlock() | ||
| rcState.subscribed = v | ||
| } | ||
|
|
||
| // SetBufferedForTest sets a buffered update for testing. | ||
| func SetBufferedForTest(u *remoteconfig.ProductUpdate) { | ||
| rcState.Lock() | ||
| defer rcState.Unlock() | ||
| rcState.buffered = u | ||
| } | ||
|
|
||
| // GetBufferedForTest returns the current buffered update for testing. | ||
| func GetBufferedForTest() *remoteconfig.ProductUpdate { | ||
| rcState.Lock() | ||
| defer rcState.Unlock() | ||
| return rcState.buffered | ||
| } |
There was a problem hiding this comment.
These exported *ForTest helpers live in a non-_test.go file, so they become part of the shipped internal/openfeature package API and allow mutating global state from any in-repo code. If possible, prefer keeping this test-only surface out of production builds (e.g., move tests that need state mutation into the internal/openfeature package, or put test utilities in a dedicated test-only package/build-tagged file).
There was a problem hiding this comment.
Re: the suggestions
"Move tests that need state mutation into the internal/openfeature package" - Not feasible. The tests in openfeature/rc_subscription_test.go need access to unexported types/functions from the openfeature package (newDatadogProvider, attachProvider, universalFlagsConfiguration, etc.). Moving them to internal/openfeature would break this access.
"Put test utilities in a build-tagged file" - This would work (//go:build testutils) but requires all test runs to use go test -tags testutils ./..., including CI updates. These changes are well beyond the scope of what we're trying to achieve here.
The primary risk seems to be that "in-repo code could misuse these" - which is a code review concern and is already communicated with the *ForTest suffix for each of these methods.
| // | ||
| // Note: This function is currently not fully implemented as Remote Config | ||
| // doesn't provide an Unsubscribe method yet. The provider will continue | ||
| // to receive updates until the Remote Config client is stopped. | ||
| func stopRemoteConfig() error { | ||
| log.Debug("openfeature: unregistered from Remote Config") | ||
| // For now, we can unregister the capability to stop receiving updates | ||
| _ = remoteconfig.UnregisterCapability(ffeCapability) | ||
| _ = remoteconfig.UnregisterCapability(remoteconfig.FFEFlagEvaluation) |
There was a problem hiding this comment.
The comment above stopRemoteConfig says Remote Config “doesn't provide an Unsubscribe method yet”, but internal/remoteconfig does expose Unsubscribe() (and Subscribe returns a token). Either update this comment to reflect the real limitation (e.g., this package currently discards the subscription token), or store the token from Subscribe and call Unsubscribe here instead of unregistering the capability globally.
There was a problem hiding this comment.
The comment is incorrect. The Unsubscribe() method is right here!
However, this package discards the subscription token.
dd-trace-go/openfeature/remoteconfig.go
Line 37 in 9789643
remoteconfig.Subscribe returns a token you can use later to unsubscribe: token, err := remoteconfig.Subscribe(product, callback, capability) but we ignore it with the _, so we can't do remoteconfig.Unsubscribe(token) afterwards.
IMO it would be nice to fix this, but it feels out of scope for this PR, so I'm just updating the comment for now. I see the subscription token getting discarded elsewhere in the SDK as well, so this seems to be a common pattern
BenchmarksBenchmark execution time: 2026-03-13 12:41:30 Comparing candidate commit b95c6a4 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 155 metrics, 9 unstable metrics.
|
Move the FFE_FLAGS Remote Config subscription into the tracer's startRemoteConfig() so it's included in the very first RC poll, eliminating one poll interval of latency before flag configs arrive.
Move the forwarding callback and buffering logic into internal/openfeature so the tracer doesn't import the full openfeature package (which pulls in OTel metrics/gRPC). Also fix lint issues: maps.Copy, err.Error() in log, gofmt. Run fixmodules to update contrib go.mod/go.sum.
2de74cb to
41ae016
Compare
sameerank
left a comment
There was a problem hiding this comment.
I can approve for myself, but it would be good to get another review
Replace startWithRemoteConfig() call with direct SubscribeProvider() + attachProvider() calls to avoid touching global remoteconfig state, which raced with lingering shutdown goroutines from integration tests. Also fix redundant err.Error() in log.Warn format string.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f00a862da4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
SubscribeRC() cached subscribed=true permanently, but remoteconfig.Stop() destroys all subscriptions. On a Start() -> Stop() -> Start() cycle, SubscribeRC() would no-op, leaving the provider permanently stuck on stale config. Now verify the subscription is still live via HasProduct() before short-circuiting. If lost, reset state and re-subscribe.
…acerSubscribed to tracerOwnsSubscription
What does this PR do?
Subscribes to FFE_FLAGS during
tracer.startRemoteConfig()so feature flag configurations are included in the very first RC poll, eliminating the extra poll interval latency whenNewDatadogProvider()is called aftertracer.Start().Also fixes several correctness issues discovered during review: deep copying buffered configs, preventing race conditions between tracer and provider subscription, and explicitly rejecting multiple providers (which were never supported but failed silently).
Motivation
When
NewDatadogProvider()is called aftertracer.Start(), the RC client has already started and the first poll may have already fired. This means FFE_FLAGS misses the first poll, adding one extra poll interval of latency before flag configs arrive and the provider becomes ready.Changes
Early subscription (original goal):
tracer.startRemoteConfig()(alongside APM_TRACING and LIVE_DEBUGGING)internal/openfeaturebuffers the latest config until the provider attachesNewDatadogProvider()is called, it replays the buffered config (fast path) or falls back to its own RC subscription (slow path)ffeCapability = 46into the remoteconfig capability iota block asFFEFlagEvaluationCorrectness fixes (from review):
bytes.Clonewhen buffering RC updates to prevent corruption if RC reuses buffersSubscribeProvider()to serialize tracer/provider subscription and prevent race conditionsAttachCallback) and slow path (SubscribeProvider) - multiple providers were never supported but previously failed silently with the last callback overwriting earlier ones inproductCallbacks()Verification
I wrote a side-by-side comparison in ffe-dogfooding for testing:
github.com/DataDog/dd-trace-go/v2 v2.5.0)go.modreplace directive)Both apps were modified to:
tracer.Start()andNewDatadogProvider()to simulate the real-world scenario where an app starts the tracer, performs other initialization, and only later creates the OpenFeature providerHow
The following instrumentation was added to both apps:
Results
tracer.Start()completeNewDatadogProvider()calledNewDatadogProvider()Conclusion
The fix eliminates one full RC poll interval of latency (~5-8 seconds) when
NewDatadogProvider()is called after the first RC poll has already completed.Try it out
The test setup is in
test/pr-4495-rc-timing-fixin ffe-dogfooding. Find/replace/path/to/local/dd-trace-gowith your local path. Then rundocker-compose up --build app-go app-go-localand observe and compare the[TIMING]logs.Reviewer's Checklist
make lintlocally.make testlocally.make generatelocally.make fix-moduleslocally.