fix(openfeature): block initialize() until RC config arrives#16650
fix(openfeature): block initialize() until RC config arrives#16650leoromanovsky wants to merge 9 commits intomainfrom
Conversation
DataDogProvider.initialize() now blocks until Remote Config delivers the first FFE configuration or a configurable timeout expires (default 30s). This matches the behavior of Java (CountDownLatch), Go (sync.Cond), and Node.js (Promise) providers. Previously, initialize() returned immediately without config, causing the OpenFeature SDK to emit PROVIDER_READY prematurely. Flag evaluations in this window silently returned defaults. Fixes: FFL-1843
Performance SLOsComparing candidate fix/ffl-1843-openfeature-init-blocking (f29acd1) with baseline main (b43e1e7) 📈 Performance Regressions (2 suites)📈 iastaspects - 118/118✅ add_aspectTime: ✅ 104.540µs (SLO: <130.000µs 📉 -19.6%) vs baseline: +4.8% Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ add_inplace_aspectTime: ✅ 100.217µs (SLO: <130.000µs 📉 -22.9%) vs baseline: -1.2% Memory: ✅ 43.057MB (SLO: <46.000MB -6.4%) vs baseline: +5.1% ✅ add_inplace_noaspectTime: ✅ 28.250µs (SLO: <40.000µs 📉 -29.4%) vs baseline: -0.2% Memory: ✅ 43.037MB (SLO: <46.000MB -6.4%) vs baseline: +5.1% ✅ add_noaspectTime: ✅ 48.749µs (SLO: <70.000µs 📉 -30.4%) vs baseline: +0.3% Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ bytearray_aspectTime: ✅ 251.609µs (SLO: <400.000µs 📉 -37.1%) vs baseline: +0.3% Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +5.0% ✅ bytearray_extend_aspectTime: ✅ 637.752µs (SLO: <800.000µs 📉 -20.3%) vs baseline: +1.4% Memory: ✅ 42.998MB (SLO: <46.000MB -6.5%) vs baseline: +5.0% ✅ bytearray_extend_noaspectTime: ✅ 266.713µs (SLO: <400.000µs 📉 -33.3%) vs baseline: +0.8% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.6% ✅ bytearray_noaspectTime: ✅ 136.556µs (SLO: <300.000µs 📉 -54.5%) vs baseline: +1.0% Memory: ✅ 43.018MB (SLO: <46.000MB -6.5%) vs baseline: +5.0% ✅ bytes_aspectTime: ✅ 219.506µs (SLO: <300.000µs 📉 -26.8%) vs baseline: +1.5% Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ bytes_noaspectTime: ✅ 133.234µs (SLO: <200.000µs 📉 -33.4%) vs baseline: ~same Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.8% ✅ bytesio_aspectTime: ✅ 3.774ms (SLO: <5.000ms 📉 -24.5%) vs baseline: ~same Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.8% ✅ bytesio_noaspectTime: ✅ 317.012µs (SLO: <420.000µs 📉 -24.5%) vs baseline: +0.2% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.8% ✅ capitalize_aspectTime: ✅ 89.011µs (SLO: <300.000µs 📉 -70.3%) vs baseline: ~same Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +5.0% ✅ capitalize_noaspectTime: ✅ 252.914µs (SLO: <300.000µs 📉 -15.7%) vs baseline: +0.7% Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +4.8% ✅ casefold_aspectTime: ✅ 89.319µs (SLO: <500.000µs 📉 -82.1%) vs baseline: +0.8% Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ casefold_noaspectTime: ✅ 305.814µs (SLO: <500.000µs 📉 -38.8%) vs baseline: +0.5% Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.8% ✅ decode_aspectTime: ✅ 87.258µs (SLO: <100.000µs 📉 -12.7%) vs baseline: +0.6% Memory: ✅ 42.880MB (SLO: <46.000MB -6.8%) vs baseline: +4.8% ✅ decode_noaspectTime: ✅ 151.800µs (SLO: <210.000µs 📉 -27.7%) vs baseline: -0.6% Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.7% ✅ encode_aspectTime: ✅ 84.012µs (SLO: <200.000µs 📉 -58.0%) vs baseline: -0.4% Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +4.8% ✅ encode_noaspectTime: ✅ 139.236µs (SLO: <200.000µs 📉 -30.4%) vs baseline: -0.8% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.8% ✅ format_aspectTime: ✅ 14.708ms (SLO: <19.200ms 📉 -23.4%) vs baseline: +0.5% Memory: ✅ 43.096MB (SLO: <46.000MB -6.3%) vs baseline: +4.9% ✅ format_map_aspectTime: ✅ 16.397ms (SLO: <21.500ms 📉 -23.7%) vs baseline: -0.3% Memory: ✅ 43.037MB (SLO: <46.000MB -6.4%) vs baseline: +4.8% ✅ format_map_noaspectTime: ✅ 373.893µs (SLO: <500.000µs 📉 -25.2%) vs baseline: +0.6% Memory: ✅ 42.880MB (SLO: <46.000MB -6.8%) vs baseline: +4.8% ✅ format_noaspectTime: ✅ 303.810µs (SLO: <500.000µs 📉 -39.2%) vs baseline: +0.6% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.8% ✅ index_aspectTime: ✅ 128.683µs (SLO: <300.000µs 📉 -57.1%) vs baseline: +7.7% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ index_noaspectTime: ✅ 40.444µs (SLO: <300.000µs 📉 -86.5%) vs baseline: +0.8% Memory: ✅ 42.998MB (SLO: <46.000MB -6.5%) vs baseline: +5.1% ✅ join_aspectTime: ✅ 210.014µs (SLO: <300.000µs 📉 -30.0%) vs baseline: -0.8% Memory: ✅ 42.880MB (SLO: <46.000MB -6.8%) vs baseline: +4.6% ✅ join_noaspectTime: ✅ 141.731µs (SLO: <300.000µs 📉 -52.8%) vs baseline: +0.9% Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.7% ✅ ljust_aspectTime: ✅ 582.159µs (SLO: <700.000µs 📉 -16.8%) vs baseline: 📈 +17.2% Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +5.0% ✅ ljust_noaspectTime: ✅ 261.165µs (SLO: <300.000µs 📉 -12.9%) vs baseline: +0.4% Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +5.0% ✅ lower_aspectTime: ✅ 295.861µs (SLO: <500.000µs 📉 -40.8%) vs baseline: +0.8% Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +5.0% ✅ lower_noaspectTime: ✅ 236.227µs (SLO: <300.000µs 📉 -21.3%) vs baseline: +1.5% Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +5.0% ✅ lstrip_aspectTime: ✅ 0.268ms (SLO: <3.000ms 📉 -91.1%) vs baseline: -1.8% Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.8% ✅ lstrip_noaspectTime: ✅ 0.177ms (SLO: <3.000ms 📉 -94.1%) vs baseline: ~same Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +5.0% ✅ modulo_aspectTime: ✅ 14.355ms (SLO: <18.750ms 📉 -23.4%) vs baseline: +0.5% Memory: ✅ 43.057MB (SLO: <46.000MB -6.4%) vs baseline: +4.8% ✅ modulo_aspect_for_bytearray_bytearrayTime: ✅ 14.917ms (SLO: <19.350ms 📉 -22.9%) vs baseline: +0.5% Memory: ✅ 43.096MB (SLO: <46.000MB -6.3%) vs baseline: +5.1% ✅ modulo_aspect_for_bytesTime: ✅ 14.368ms (SLO: <18.900ms 📉 -24.0%) vs baseline: ~same Memory: ✅ 43.096MB (SLO: <46.000MB -6.3%) vs baseline: +4.9% ✅ modulo_aspect_for_bytes_bytearrayTime: ✅ 14.588ms (SLO: <19.150ms 📉 -23.8%) vs baseline: +0.2% Memory: ✅ 43.175MB (SLO: <46.000MB -6.1%) vs baseline: +4.9% ✅ modulo_noaspectTime: ✅ 0.363ms (SLO: <3.000ms 📉 -87.9%) vs baseline: -0.4% Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +5.0% ✅ replace_aspectTime: ✅ 18.367ms (SLO: <24.000ms 📉 -23.5%) vs baseline: -0.5% Memory: ✅ 43.057MB (SLO: <46.000MB -6.4%) vs baseline: +4.8% ✅ replace_noaspectTime: ✅ 279.653µs (SLO: <300.000µs -6.8%) vs baseline: -0.2% Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ repr_aspectTime: ✅ 312.528µs (SLO: <420.000µs 📉 -25.6%) vs baseline: +1.0% Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.8% ✅ repr_noaspectTime: ✅ 47.163µs (SLO: <90.000µs 📉 -47.6%) vs baseline: +0.9% Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ rstrip_aspectTime: ✅ 381.972µs (SLO: <500.000µs 📉 -23.6%) vs baseline: +0.5% Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +5.0% ✅ rstrip_noaspectTime: ✅ 185.229µs (SLO: <300.000µs 📉 -38.3%) vs baseline: +0.9% Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ slice_aspectTime: ✅ 186.172µs (SLO: <300.000µs 📉 -37.9%) vs baseline: +0.3% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ slice_noaspectTime: ✅ 54.005µs (SLO: <90.000µs 📉 -40.0%) vs baseline: ~same Memory: ✅ 42.998MB (SLO: <46.000MB -6.5%) vs baseline: +4.9% ✅ stringio_aspectTime: ✅ 3.826ms (SLO: <5.000ms 📉 -23.5%) vs baseline: ~same Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +5.0% ✅ stringio_noaspectTime: ✅ 376.855µs (SLO: <500.000µs 📉 -24.6%) vs baseline: +9.1% Memory: ✅ 42.880MB (SLO: <46.000MB -6.8%) vs baseline: +4.9% ✅ strip_aspectTime: ✅ 269.878µs (SLO: <350.000µs 📉 -22.9%) vs baseline: -0.1% Memory: ✅ 42.998MB (SLO: <46.000MB -6.5%) vs baseline: +5.1% ✅ strip_noaspectTime: ✅ 176.808µs (SLO: <240.000µs 📉 -26.3%) vs baseline: +0.2% Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ swapcase_aspectTime: ✅ 335.605µs (SLO: <500.000µs 📉 -32.9%) vs baseline: +0.7% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +5.0% ✅ swapcase_noaspectTime: ✅ 270.974µs (SLO: <400.000µs 📉 -32.3%) vs baseline: +0.4% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ title_aspectTime: ✅ 321.120µs (SLO: <500.000µs 📉 -35.8%) vs baseline: ~same Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.7% ✅ title_noaspectTime: ✅ 257.723µs (SLO: <400.000µs 📉 -35.6%) vs baseline: ~same Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +5.0% ✅ translate_aspectTime: ✅ 490.708µs (SLO: <700.000µs 📉 -29.9%) vs baseline: ~same Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ translate_noaspectTime: ✅ 423.046µs (SLO: <500.000µs 📉 -15.4%) vs baseline: -1.2% Memory: ✅ 42.998MB (SLO: <46.000MB -6.5%) vs baseline: +5.0% ✅ upper_aspectTime: ✅ 293.901µs (SLO: <500.000µs 📉 -41.2%) vs baseline: -0.5% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ upper_noaspectTime: ✅ 235.192µs (SLO: <400.000µs 📉 -41.2%) vs baseline: +0.5% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.8% 📈 iastaspectsospath - 24/24✅ ospathbasename_aspectTime: ✅ 513.965µs (SLO: <700.000µs 📉 -26.6%) vs baseline: 📈 +21.5% Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +5.0% ✅ ospathbasename_noaspectTime: ✅ 434.107µs (SLO: <700.000µs 📉 -38.0%) vs baseline: +1.7% Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +4.8% ✅ ospathjoin_aspectTime: ✅ 622.533µs (SLO: <700.000µs 📉 -11.1%) vs baseline: ~same Memory: ✅ 42.880MB (SLO: <46.000MB -6.8%) vs baseline: +4.4% ✅ ospathjoin_noaspectTime: ✅ 638.573µs (SLO: <700.000µs -8.8%) vs baseline: ~same Memory: ✅ 42.821MB (SLO: <46.000MB -6.9%) vs baseline: +4.7% ✅ ospathnormcase_aspectTime: ✅ 347.580µs (SLO: <700.000µs 📉 -50.3%) vs baseline: -0.8% Memory: ✅ 42.880MB (SLO: <46.000MB -6.8%) vs baseline: +4.5% ✅ ospathnormcase_noaspectTime: ✅ 359.552µs (SLO: <700.000µs 📉 -48.6%) vs baseline: ~same Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.8% ✅ ospathsplit_aspectTime: ✅ 491.361µs (SLO: <700.000µs 📉 -29.8%) vs baseline: +0.4% Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ ospathsplit_noaspectTime: ✅ 499.241µs (SLO: <700.000µs 📉 -28.7%) vs baseline: -0.1% Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +5.0% ✅ ospathsplitdrive_aspectTime: ✅ 373.398µs (SLO: <700.000µs 📉 -46.7%) vs baseline: -0.4% Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.8% ✅ ospathsplitdrive_noaspectTime: ✅ 73.190µs (SLO: <700.000µs 📉 -89.5%) vs baseline: -0.7% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +5.0% ✅ ospathsplitext_aspectTime: ✅ 454.970µs (SLO: <700.000µs 📉 -35.0%) vs baseline: -0.3% Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +5.0% ✅ ospathsplitext_noaspectTime: ✅ 465.483µs (SLO: <700.000µs 📉 -33.5%) vs baseline: +0.4% Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) 🟡 Near SLO Breach (1 suite)🟡 tracer - 6/6✅ largeTime: ✅ 31.402ms (SLO: <32.950ms -4.7%) vs baseline: -0.8% Memory: ✅ 36.667MB (SLO: <39.250MB -6.6%) vs baseline: +4.7% ✅ mediumTime: ✅ 3.110ms (SLO: <3.200ms -2.8%) vs baseline: ~same Memory: ✅ 35.586MB (SLO: <38.750MB -8.2%) vs baseline: +4.6% ✅ smallTime: ✅ 366.389µs (SLO: <370.000µs 🟡 -1.0%) vs baseline: +4.7% Memory: ✅ 35.507MB (SLO: <38.750MB -8.4%) vs baseline: +5.0%
|
Codeowners resolved as |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 60001256d6
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Add 5 new tests validating that the OpenFeature provider's initialize() blocks until Remote Config delivers flag configuration, matching the behavior of Java, Go, and Node.js providers. Tests: - test_ffe_init_blocks_until_config_received: config before start - test_ffe_init_returns_real_values_not_defaults: config after start - test_ffe_evaluation_immediately_after_start_without_config: no config - test_ffe_init_blocks_and_resolves_when_config_arrives: mid-block delivery - test_ffe_init_timeout_returns_error: custom timeout + late recovery Also removes the module-level auto-initialization of the OpenFeature provider in the Python parametric weblog. The provider should only be initialized when the test calls /ffe/start, not at server startup. With the blocking initialize() fix (DataDog/dd-trace-py#16650), the module-level init would block server startup for 30s. Requires: DataDog/dd-trace-py#16650 Fixes: FFL-1843
typotter
left a comment
There was a problem hiding this comment.
Looks good to me from the logic side. I'm no python expert, so I'll leave that aspect to others.
Looks like you have added a parametric test for this in another PR, might be worth a unit test if it's feasible.
- Rename init_timeout to initialization_timeout (Oleksii #1) - Eliminate _config_received bool, use threading.Event directly (Oleksii #2) - Remove Java/Go/Node.js references from docstring (Oleksii #3, Tyler #1) - Add debug log on fast-path when config already exists (Oleksii #4) - Remove redundant state updates after wait (Oleksii #5) - Reorder on_configuration_received: set status before signaling event (Oleksii #6)
|
Addressed all review feedback in e7306ba: Oleksii 1-7: All implemented (rename, eliminate bool, trim docstring, debug log, remove redundant state, reorder event signal, registration noted as follow-up). Oleksii #8 (releasenotes): Keeping as fix-only. The blocking behavior is what customers already expected -- they assumed Tyler #2 (EXPERIMENTAL): Keeping Tyler #3 (unit tests): Adding now. |
Add 4 unit tests for the initialize() blocking behavior: - test_initialize_blocks_until_config_arrives: config mid-wait unblocks - test_initialize_fast_path_when_config_exists: pre-loaded config - test_initialize_timeout_raises: short timeout -> ERROR state - test_late_recovery_after_timeout: config after timeout -> READY Also update existing tests to use _config_received.is_set() since _config_received is now a threading.Event instead of a bool.
Move _register_provider() from __init__() to initialize() so that a provider re-registers for RC config callbacks after shutdown + re-initialization. Previously, shutdown() called _unregister_provider() but __init__() only runs once, so re-initialization would leave the provider unable to receive config updates. Addresses review feedback from dd-oleksii and typotter.
releasenotes/notes/fix-openfeature-init-blocking-70c8d5a99287cc49.yaml
Outdated
Show resolved
Hide resolved
|
This pull request has been automatically closed after a period of inactivity. |
128b928 to
8c382da
Compare
Motivation
DataDogProvider.initialize()returns immediately without waiting for Remote Config data. The OpenFeature SDK then emitsPROVIDER_READY(per spec: "READY when initialize() terminates normally"), so consumers believe the provider is ready. Flag evaluations in this window silently return default values withreason: DEFAULT— there is no error, no indication that config hasn't loaded yet.This was reported by a customer running a Python script (not a long-running server). On servers the bug is masked because RC config typically arrives during startup before any evaluations happen. In scripts and short-lived processes,
set_provider()returns in 0.00s and the very next evaluation gets defaults.Every other Datadog OpenFeature provider blocks inside
initialize()until config arrives:CountDownLatch.await(timeout, unit)— default 30ssync.Cond.Wait()inside a loop — default 30sawait initController.wait()(deferred Promise) — default 30sFixes: FFL-1843
Changes
DataDogProvider.__init__()now creates athreading.Event(_config_event) used to blockinitialize()until config arrives.initialize()checks if config already exists (fast path), otherwise calls_config_event.wait(timeout). If the timeout expires without config, it raisesProviderNotReadyError(the SDK then dispatchesPROVIDER_ERROR).on_configuration_received()calls_config_event.set()to unblockinitialize()when the first RC payload arrives. If init already timed out, it emitsPROVIDER_READYfor late recovery.shutdown()clears the event for clean re-initialization.DD_EXPERIMENTAL_FLAGGING_PROVIDER_INITIALIZATION_TIMEOUT_MS(default 30000) controls the timeout. Also configurable via constructor:DataDogProvider(init_timeout=10.0).Decisions
set_provider()(noset_provider_and_wait()yet), and it callsinitialize()synchronously on the caller's thread. So blocking here meansset_provider()itself blocks — which is the correct default behavior for most users.ProviderNotReadyErrorrather than returning silently. This puts the provider inERRORstate (not prematureREADY), which is the same pattern Java and Node.js use on timeout.on_configuration_received()emitsPROVIDER_READYand the provider transitions fromERRORtoREADY.init_timeout=0async mode: rather than adding a special non-blocking mode to the provider, async customers can wrapset_provider()in a background thread and listen forPROVIDER_READYevents. A properset_provider_and_wait()is being contributed upstream to the OpenFeature Python SDK (open-feature/python-sdk#567).Testing
Verified locally using system-tests parametric tests against a patched build:
test_ffe_evaluation_immediately_after_start_without_configffe_start()returned in 0.00s, eval returned defaultstest_ffe_init_blocks_until_config_receivedtest_ffe_init_returns_real_values_not_defaultsServer log confirms blocking:
Waiting up to 30.0s for initial FFE configuration from Remote ConfigExisting FFE parametric tests: 13 passed, 0 failed (remaining errors were container resource exhaustion, not code-related).