[coordinator] Fix status reporting of out-of-order component updates#13119
Merged
VihasMakwana merged 8 commits intoelastic:mainfrom Mar 13, 2026
Merged
[coordinator] Fix status reporting of out-of-order component updates#13119VihasMakwana merged 8 commits intoelastic:mainfrom
VihasMakwana merged 8 commits intoelastic:mainfrom
Conversation
760eaab to
d8ca1fb
Compare
Contributor
|
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
c118dfe to
68abcc5
Compare
68abcc5 to
faa0fdf
Compare
swiatekm
reviewed
Mar 11, 2026
internal/pkg/agent/application/coordinator/coordinator_state.go
Outdated
Show resolved
Hide resolved
Contributor
Author
Unfortunately, I haven't been able to reproduce this. I'm thinking of adding |
VihasMakwana
commented
Mar 11, 2026
Contributor
Author
Member
|
Thanks, agree this is a nice solution. I'll let Mikolaj do the approving. |
swiatekm
reviewed
Mar 12, 2026
internal/pkg/agent/application/coordinator/coordinator_state.go
Outdated
Show resolved
Hide resolved
swiatekm
reviewed
Mar 12, 2026
swiatekm
approved these changes
Mar 12, 2026
Contributor
Author
|
/test |
1 similar comment
Contributor
Author
|
/test |
Contributor
💛 Build succeeded, but was flaky
Failed CI StepsHistory
|
Merged
8 tasks
VihasMakwana
added a commit
that referenced
this pull request
Mar 13, 2026
…13119) (#13151) * fix: fix out-of-order component states * add StartTIme in one more place * comments * bug fix related to otel manager * fix the test case for windows * simplify * todo (cherry picked from commit 659774d) Co-authored-by: Vihas Makwana <121151420+VihasMakwana@users.noreply.github.com>
VihasMakwana
added a commit
that referenced
this pull request
Mar 13, 2026
…13119) (#13152) * fix: fix out-of-order component states * add StartTIme in one more place * comments * bug fix related to otel manager * fix the test case for windows * simplify * todo (cherry picked from commit 659774d) Co-authored-by: Vihas Makwana <121151420+VihasMakwana@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
When transitioning from otel to process runtime, if an otel component takes too long to stop, it will emit
Stoppedstate only after timeout expiration. By this time, the process runtime would have already reported aStartingstate.Upon receiving a
Stoppedstate from old runtime, we will erroneously remove the newStartingstate.This PR fixes the flow by introducing a new
LastCreatedAtvariable for a component. We will only process a state update when the state update is either from same instance of the component, or from a newer instance.Why is it important?
Buggy scenario:
cis created attime=0sStartingstate. We will report this state as it's the first state for this component.processmode andtime=1s. It also reports aStartingstate for a given component.Stoppedstate.Stoppedevent and erroneously removes the new component from the status map.After the PR:
cis created atstartTime=0sStartingstate. We will report this state as it's the first state for this component.processmode andstartTime=1s. It also reports aStartingstate for a given component.Stoppedstate.Stoppedevent but ignores it, since the storedstartTimeof the current component is later than the received event.Checklist
./changelog/fragmentsusing the changelog toolHow to test this PR locally
Related issues