[ci] test: rl-insight monitor add ci#86
Conversation
There was a problem hiding this comment.
Code Review
This pull request adds the "requests" dependency and introduces a comprehensive test suite for the "rl_insight" monitoring API, configuration loader, server commands, and end-to-end server lifecycle. The review feedback highlights critical improvements in "tests/monitor/test_server_lifecycle_e2e.py" to ensure robust parsing of "/proc/{pid}/stat" when process names contain spaces or parentheses, and to implement safer process termination logic that avoids calling "os.killpg" on non-group leaders or accidentally terminating the test runner's own process group.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| proc_stat = Path(f"/proc/{pid}/stat") | ||
| if proc_stat.exists(): | ||
| try: | ||
| fields = proc_stat.read_text(encoding="utf-8").split() | ||
| except OSError: | ||
| return True | ||
| if len(fields) > 2 and fields[2] == "Z": | ||
| return False |
There was a problem hiding this comment.
Parsing /proc/{pid}/stat by simply splitting on whitespace can fail if the process name (enclosed in parentheses) contains spaces or parentheses. A more robust approach is to find the last closing parenthesis ) first, and then split the remaining string to extract the process state.
| proc_stat = Path(f"/proc/{pid}/stat") | |
| if proc_stat.exists(): | |
| try: | |
| fields = proc_stat.read_text(encoding="utf-8").split() | |
| except OSError: | |
| return True | |
| if len(fields) > 2 and fields[2] == "Z": | |
| return False | |
| proc_stat = Path(f"/proc/{pid}/stat") | |
| if proc_stat.exists(): | |
| try: | |
| stat_content = proc_stat.read_text(encoding="utf-8") | |
| except OSError: | |
| return True | |
| rpar = stat_content.rfind(")") | |
| if rpar != -1: | |
| fields = stat_content[rpar + 1 :].split() | |
| if fields and fields[0] == "Z": | |
| return False |
| try: | ||
| os.killpg(pid, signal.SIGTERM) | ||
| except OSError: | ||
| try: | ||
| os.kill(pid, signal.SIGTERM) | ||
| except OSError: | ||
| continue |
There was a problem hiding this comment.
Calling os.killpg(pid, ...) directly with a process ID that is not a process group leader is conceptually incorrect and will fail with ProcessLookupError. Furthermore, if we attempt to kill the process group, we must ensure we do not accidentally kill the test runner's own process group (which would abort the entire test suite). It is safer to kill the process itself first, and then only kill its process group if it is different from the current process group.
| try: | |
| os.killpg(pid, signal.SIGTERM) | |
| except OSError: | |
| try: | |
| os.kill(pid, signal.SIGTERM) | |
| except OSError: | |
| continue | |
| try: | |
| os.kill(pid, signal.SIGTERM) | |
| try: | |
| pgid = os.getpgid(pid) | |
| if pgid != os.getpgrp(): | |
| os.killpg(pgid, signal.SIGTERM) | |
| except OSError: | |
| pass | |
| except OSError: | |
| continue |
| try: | ||
| os.killpg(pid, signal.SIGKILL) | ||
| except OSError: | ||
| try: | ||
| os.kill(pid, signal.SIGKILL) | ||
| except OSError: | ||
| pass |
There was a problem hiding this comment.
Apply the same safe termination logic for SIGKILL to avoid conceptually incorrect os.killpg calls and prevent accidentally killing the test runner's own process group.
| try: | |
| os.killpg(pid, signal.SIGKILL) | |
| except OSError: | |
| try: | |
| os.kill(pid, signal.SIGKILL) | |
| except OSError: | |
| pass | |
| try: | |
| os.kill(pid, signal.SIGKILL) | |
| try: | |
| pgid = os.getpgid(pid) | |
| if pgid != os.getpgrp(): | |
| os.killpg(pgid, signal.SIGKILL) | |
| except OSError: | |
| pass | |
| except OSError: | |
| pass |
What does this PR do?
monitor add ci
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}include:monitor-api,monitor-cli,monitor-collector,monitor-server,monitor-configfor online monitor changesrecipefor offline analysis changes, including pipeline, parser, visualizer, data checker, recipe config, examples, and sample datadoc,ci,perf,deployment,miscfor cross-cutting changes,like[recipe, ci]or[monitor-server, monitor-config]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][mstx, torch_profile] feat: support timeline parsingTest
API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always