Adding monitoring Scripts#101
Conversation
Signed-off-by: mpatilgit-hub9 <Mahesh.Patil9@ibm.com>
| build: | ||
| strategy: | ||
| matrix: | ||
| runner: ["ubuntu-24.04-ppc64le", "ubuntu-24.04-ppc64le-p10"] |
There was a problem hiding this comment.
I think we should monitor all types of workers (default, large, p/z) and not limit to the above two.
There was a problem hiding this comment.
"ubuntu-24.04-ppc64le",
"ubuntu-24.04-ppc64le-p10",
"ubuntu-24.04-ppc64le-2xlarge",
"ubuntu-24.04-ppc64le-2xlarge-p10",
"ubuntu-24.04-ppc64le-4xlarge",
"ubuntu-24.04-ppc64le-4xlarge-p10",
"ubuntu-24.04-s390x"
we have tested for above and ready to push changes in pr. Hope this fulfil our need
There was a problem hiding this comment.
Hi Anup,
We have now updated the workflow to include all runner types (default, p/z, 2xlarge, 4xlarge, etc.) as suggested and validated the execution across them.
One observation from testing:
Runners like 2xlarge and 4xlarge tend to have higher queue and execution times compared to standard runners. If we monitor all of them using the same thresholds, it may lead to frequent false alerts from the watchdog.
As a follow-up improvement, we can consider:
- Splitting heavy runners (2xlarge/4xlarge) into a separate workflow, or
- Applying relaxed thresholds / lower monitoring frequency for them
This will help reduce alert noise and improve signal quality.
For now, the current implementation covers all runner types as expected. Please let me know if you’d like us to proceed with the split-monitoring approach as well.
There was a problem hiding this comment.
I like the idea of splitting up the 2x and 4xlarge runners into a separate workflow to reduce false alarms and to prioritize the larger workflow runs actually using this service.
| steps: | ||
| - uses: actions/checkout@v4 | ||
| - name: Run a one-line script | ||
| run: echo "Hello, world! GitHub app is running successfully on ${{ matrix.runner }}" |
There was a problem hiding this comment.
lets think of adding a basic test (io, network) and not just an echo test (but i'm open to adding that in future PRs)
There was a problem hiding this comment.
@anup-kodlekere what did you have in mind? I think this could fulfill the monitoring requirement but I it would be nice to do something that mimics a real workflow. I like the idea of an io test
Signed-off-by: mpatilgit-hub9 <Mahesh.Patil9@ibm.com>
|
so can this PR be closed? |
For workflow monitoring, we are adding monitoring scripts.