OCPBUGS-81340: openshift-tests: allow duplicate pacemaker status collector CronJob events#30953
Conversation
…ector CronJob events cluster-etcd-operator runs a short-interval CronJob (pacemaker-status-collector) in openshift-etcd-operator. kube-controller-manager's cronjob-controller emits SuccessfulCreate and SawCompletedJob on the CronJob involvedObject; long runs can exceed the duplicated-event threshold. Allow those repeats via PacemakerStatusCollectorCronJobEvents. SuccessfulDelete churn is addressed on the CEO side via Job TTL/history (OCPBUGS-81340). Made-with: Cursor
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
@jaypoulz: This pull request references Jira Issue OCPBUGS-81340, which is valid. 3 validation(s) were run on this bug
The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughAdded a new Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_events_test.go (1)
214-225: Add a companion test forSawCompletedJobto fully cover the matcher contract.This case covers
SuccessfulCreate, but the matcher also allowsSawCompletedJob. Adding that second reason path prevents silent regressions.Suggested test addition
{ name: "pacemaker status collector cronjob successful create (KCM cronjob-controller)", locator: monitorapi.Locator{ Keys: map[monitorapi.LocatorKey]string{ monitorapi.LocatorNamespaceKey: "openshift-etcd-operator", monitorapi.LocatorKey("cronjob"): "pacemaker-status-collector", }, }, msg: monitorapi.NewMessage().HumanMessage("Created job pacemaker-status-collector-29035215"). Reason("SuccessfulCreate").Build(), expectedAllowName: "PacemakerStatusCollectorCronJobEvents", }, + { + name: "pacemaker status collector cronjob saw completed job (KCM cronjob-controller)", + locator: monitorapi.Locator{ + Keys: map[monitorapi.LocatorKey]string{ + monitorapi.LocatorNamespaceKey: "openshift-etcd-operator", + monitorapi.LocatorKey("cronjob"): "pacemaker-status-collector", + }, + }, + msg: monitorapi.NewMessage().HumanMessage("Saw completed job: pacemaker-status-collector-29035215, condition: Complete"). + Reason("SawCompletedJob").Build(), + expectedAllowName: "PacemakerStatusCollectorCronJobEvents", + },As per coding guidelines, "-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_events_test.go` around lines 214 - 225, Add a companion test case for the SawCompletedJob path to mirror the existing SuccessfulCreate case so the matcher contract is fully covered: in duplicated_events_test.go add a new test entry (similar to the existing "pacemaker status collector cronjob successful create ...") that uses monitorapi.NewMessage().HumanMessage(...) with Reason("SawCompletedJob") (and an appropriate human message like "Saw completed job pacemaker-status-collector-29035215") and the same Locator keys, and set expectedAllowName to "PacemakerStatusCollectorCronJobEvents"; this ensures both SuccessfulCreate and SawCompletedJob branches of the matcher are exercised.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_events_test.go`:
- Around line 214-225: Add a companion test case for the SawCompletedJob path to
mirror the existing SuccessfulCreate case so the matcher contract is fully
covered: in duplicated_events_test.go add a new test entry (similar to the
existing "pacemaker status collector cronjob successful create ...") that uses
monitorapi.NewMessage().HumanMessage(...) with Reason("SawCompletedJob") (and an
appropriate human message like "Saw completed job
pacemaker-status-collector-29035215") and the same Locator keys, and set
expectedAllowName to "PacemakerStatusCollectorCronJobEvents"; this ensures both
SuccessfulCreate and SawCompletedJob branches of the matcher are exercised.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: fa467f8f-8a94-40d8-a140-e50d4d63e4dd
📒 Files selected for processing (2)
pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.gopkg/monitortestlibrary/pathologicaleventlibrary/duplicated_events_test.go
|
Scheduling required tests: |
|
@jaypoulz: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
On the assumption the TNF folks and the API folks are confident the cronjob event load is safe for the APIServer /lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dgoodwin, jaypoulz The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ovn-two-node-fencing-techpreview |
|
@eggfoobar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/ab4e5940-31bb-11f1-95de-32af1b721947-0 |
|
Still trying to pull input from the API-server team in this thread. IMHO, we should be fine between this and the CEO tweak to reduce the job-deletion events. If the api-server team feels otherwise, we can open a follow-up issue to address based on their recommendations. My observation is, we'd still need this patch in if they say a 1 minute minimum was required (instead of 30s) because the tests will generate enough events in a 2 hour run to trigger the pathological events tracker. |
|
/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-metal-ovn-two-node-fencing-techpreview |
|
@jaypoulz: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/d3e19270-32ae-11f1-8bd9-751d3677bd9e-0 |
cluster-etcd-operator runs a short-interval CronJob (pacemaker-status-collector) in openshift-etcd-operator. kube-controller-manager's cronjob-controller emits SuccessfulCreate and SawCompletedJob on the CronJob involvedObject; long runs can exceed the duplicated-event threshold.
Allow those repeats via PacemakerStatusCollectorCronJobEvents. SuccessfulDelete churn is addressed on the CEO side via Job TTL/history (OCPBUGS-81340).
Made-with: Cursor