feat(adb-vision): restore live piloting and audit backend helpers#57
feat(adb-vision): restore live piloting and audit backend helpers#57
Conversation
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Superseded by #55, which is the cleaned-up canonical adb_vision PR for this stack. |
| from emulator import memuc_cli | ||
| from screenshot import take_screenshot | ||
| from tool_audit import audit_cli_call, configure, record_command | ||
|
|
There was a problem hiding this comment.
🔥 The Roast: r"D:\Program Files\Microvirt\MEmu\adb.exe" hardcoded into a diagnostic tool that exists to catch environment problems. There's a beautiful irony here: the first thing diagnose.py does is assume it's running on exactly the machine it was written on. On any other machine this file fails before it can tell you anything is wrong.
server.py already has a _find_adb() function that searches environment variables and known paths. diagnose.py duplicates the ADB path hardcode instead of reusing — which means if the ADB path ever changes, the diagnostic tool becomes the first thing to lie to you.
🩹 The Fix: Import _find_adb from server (or extract it to a shared adb_utils.py), then:
| ADB = os.environ.get("ADB_EXECUTABLE") or r"D:\Program Files\Microvirt\MEmu\adb.exe" |
At minimum add the env-var escape hatch so this isn't carved in stone.
📏 Severity: warning
| include_metadata: bool = False, | ||
| ) -> str | tuple[str, dict[str, str | int]]: | ||
| backends = _resolve_backends(method) | ||
| last_error: Exception | None = None |
There was a problem hiding this comment.
🔥 The Roast: -> str | tuple[str, dict[str, str | int]] — the classic "flag argument that changes the return type" anti-pattern, wearing a Python type hint as a disguise. You've created a function that either returns a str OR a tuple depending on a boolean you pass in at call time. Every caller now needs to do type narrowing gymnastics, and mypy is going to throw a party in the error log.
🩹 The Fix: Use @overload to give each call-site a typed contract:
@overload
async def take_screenshot(..., include_metadata: Literal[False] = ...) -> str: ...
@overload
async def take_screenshot(..., include_metadata: Literal[True]) -> tuple[str, dict[str, str | int]]: ...Or simpler: always return the tuple and let callers unpack what they need.
📏 Severity: suggestion
|
|
||
| # Restore previous window | ||
| [WinAPI]::SetForegroundWindow($previousWindow) | Out-Null | ||
|
|
There was a problem hiding this comment.
🔥 The Roast: Start-Job fires a background PowerShell job inside a script that's invoked as a one-shot subprocess from Python. When this script exits, the background job dies with it — it's not a persistent daemon, it's a fire-and-forget that gets immediately forgotten by the OS. Your 5-minute auto-delete should be called the "optimistic cleanup feature" because it will never run unless the caller holds the PS session open.
🩹 The Fix: Either use a scheduled task (Register-ScheduledJob/Register-ScheduledTask) for deferred cleanup, or just let the Python caller handle deletion after it's done consuming the file. PS Start-Job only works reliably in interactive sessions, not in -Command one-shots.
📏 Severity: warning
Code Review Roast 🔥Verdict: 4 Issues Found | Recommendation: Address before merge Overview
Issue Details (click to expand)
🏆 Best part: The 💀 Worst part: 📊 Overall: Like a well-engineered Swiss watch stuffed into a box labeled "definitely not a watch." The audit/tracing architecture is solid, the backend fallback ladder is correct, but the operational tooling ( Files Reviewed (10 files)
|
There was a problem hiding this comment.
Pull request overview
Restores the adb_vision live piloting workflow by strengthening screenshot capture fallbacks and adding structured audit logging across MCP tool calls, CLI diagnostics, and nested backend helper activity.
Changes:
- Add structured JSONL audit logging (
tool_audit.py) with FastMCP middleware + child event support; integrate it into the MCP server and CLIs. - Extend the screenshot “auto” ladder with
screenrecord(ffmpeg first-frame extraction) and improve scrcpy discovery/handling; update server contract to include screenshot metadata. - Add
pilot.pyand a more comprehensivediagnose.py(including a desktop PowerShell capture fallback) plus documentation and targeted tests.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/dev/logging.md | Documents the audit log and nested helper coverage. |
| docs/ROADMAP.md | Updates roadmap items to reflect the new fallback ladder + audit logging. |
| docs/ARCHITECTURE.md | Updates architecture diagram/notes for logging + surface guardrails. |
| adb_vision/tool_audit.py | Implements structured audit logging + FastMCP middleware. |
| adb_vision/test_tool_audit.py | Adds unit tests validating audit record writing and parent/child linkage. |
| adb_vision/test_server.py | Updates tests for new fallback order, scrcpy discovery hook, and screenshot metadata. |
| adb_vision/server.py | Integrates audit middleware, logs ADB subprocesses, and returns screenshot metadata-derived summaries. |
| adb_vision/screenshot.py | Adds screenrecord backend, adds audit events for HTTP/backend activity, improves scrcpy capture path. |
| adb_vision/pyproject.toml | Adds dependencies for new capture/diagnostic functionality (av, uiautomator2, pymemuc). |
| adb_vision/pilot.py | Adds a minimal audited CLI for manual piloting of core tools. |
| adb_vision/diagnose.py | Adds an audited health gate CLI with backend probing + desktop screenshot fallback. |
| adb_vision/desktop_capture.ps1 | Implements desktop screenshot capture of the MEmu window for fallback diagnostics. |
| adb_vision/README.md | Updates documented screenshot backend list and auto behavior. |
| TDD_IMPLEMENTATION_PLAN.md | Updates TDD plan status notes to reflect audit + fallback changes. |
| parser.add_argument( | ||
| "--screenshot-method", | ||
| default="auto", | ||
| choices=["auto", "droidcast", "scrcpy", "u2", "screencap"], | ||
| choices=["auto", "droidcast", "scrcpy", "u2", "screenrecord", "screencap"], | ||
| help="Default screenshot method", |
There was a problem hiding this comment.
--screenshot-method is exposed as a CLI option, but its value is never used (the adb_screenshot tool always defaults to method="auto"). This makes the flag misleading; either wire it into a module-level default used by adb_screenshot when no method is provided, or remove the argument to avoid a no-op configuration option.
| ADB = r"D:\Program Files\Microvirt\MEmu\adb.exe" | ||
| SERIAL = "127.0.0.1:21513" | ||
| GAME_PACKAGE = "com.YoStarEN.AzurLane" | ||
| SCREEN_DIR = Path(__file__).parent / "pilot_screens" | ||
| BLACK_STDDEV_THRESHOLD = 5.0 |
There was a problem hiding this comment.
diagnose.py hardcodes the ADB executable path and serial (ADB = r"D:\...\adb.exe", SERIAL = "127.0.0.1:21513"). That will break on machines with different install locations/ports and makes the diagnostic gate hard to reuse in CI or on other dev setups. Prefer using the same discovery/env-var approach as server.py (e.g., $ADB_SERIAL, $ADB_EXECUTABLE or argparse flags) and defaulting to the current hardcoded values only if nothing else is configured.
| stdout_text = stdout.decode(errors="replace").strip() | ||
| stderr_text = stderr.decode(errors="replace").strip() | ||
| record_command( | ||
| command_name="adb.exec", | ||
| argv=argv, | ||
| duration_ms=(time.perf_counter() - started) * 1000, | ||
| status="success", | ||
| stdout=stdout_text, |
There was a problem hiding this comment.
Similar to server._adb_run, this _adb_run decodes and logs stdout as text before calling record_command. When diagnosing the screencap backend (binary PNG), this can create extremely large audit records and waste CPU/memory. Passing the raw bytes into record_command (or truncating/printability-checking before decoding) would keep audits lightweight while still capturing the byte count.
| stdout_text = stdout.decode(errors="replace").strip() | |
| stderr_text = stderr.decode(errors="replace").strip() | |
| record_command( | |
| command_name="adb.exec", | |
| argv=argv, | |
| duration_ms=(time.perf_counter() - started) * 1000, | |
| status="success", | |
| stdout=stdout_text, | |
| # Avoid decoding large or binary stdout payloads (e.g., screencap PNG) into audit logs. | |
| # For small, printable outputs we decode; otherwise we log a compact summary. | |
| if stdout: | |
| # Threshold chosen to keep audits lightweight while still logging typical text output. | |
| if len(stdout) <= 4096: | |
| decoded = stdout.decode(errors="replace").strip() | |
| if decoded.isprintable(): | |
| stdout_preview = decoded | |
| else: | |
| stdout_preview = f"<{len(stdout)} bytes binary data>" | |
| else: | |
| stdout_preview = f"<{len(stdout)} bytes (truncated for logging)>" | |
| else: | |
| stdout_preview = "" | |
| stderr_text = stderr.decode(errors="replace").strip() | |
| record_command( | |
| command_name="adb.exec", | |
| argv=argv, | |
| duration_ms=(time.perf_counter() - started) * 1000, | |
| status="success", | |
| stdout=stdout_preview, |
| async def _capture_screenrecord(*, adb_run: AdbRunFn, serial: str, adb_exe: str) -> str: | ||
| ffmpeg_exe = shutil.which("ffmpeg") | ||
| if not ffmpeg_exe: | ||
| raise RuntimeError("ffmpeg not found in PATH; required for screenrecord fallback") | ||
|
|
There was a problem hiding this comment.
The new screenrecord backend is a significant behavior addition (adb shell screenrecord + pull + ffmpeg decode), but it currently only has coverage via the fallback-order test that mocks _capture_screenrecord. Consider adding a focused unit test for _capture_screenrecord that mocks adb_run, _run_logged_process, and Path.read_bytes() to verify the happy path and the key failure modes (ffmpeg missing, non-PNG output, cleanup of the remote file).
| SCREEN_DIR = Path(__file__).parent / "pilot_screens" | ||
|
|
||
|
|
||
| def _save_image(raw: bytes, stem: str) -> str: | ||
| SCREEN_DIR.mkdir(exist_ok=True) | ||
| path = SCREEN_DIR / stem | ||
| path.write_bytes(raw) |
There was a problem hiding this comment.
pilot.py writes screenshots into adb_vision/pilot_screens/ under the repo. That directory isn’t currently ignored (unlike mcp_screenshots/), so it’s easy to accidentally commit runtime artifacts. Consider either reusing the existing mcp_screenshots/ directory or adding pilot_screens/ to adb_vision/.gitignore (or the root .gitignore).
| stdout_text = stdout.decode(errors="replace").strip() | ||
| stderr_text = stderr.decode(errors="replace").strip() | ||
| record_command( | ||
| command_name="adb.exec", | ||
| argv=argv, |
There was a problem hiding this comment.
_adb_run decodes and logs stdout/stderr as text. For commands that return binary (notably adb exec-out screencap -p used by the screenshot backend), this can produce a huge mostly-unreadable string and bloat/slow the audit log path. Consider passing the raw bytes into record_command (it already summarizes binary safely) and only decoding for known-text commands or after a size/printability check.
Summary
adb_visionpiloting path with reliable screenshot fallbacks and a realdiagnose.pyhealth gatescrcpy,ffmpeg, and the PowerShell desktop fallbackWhy
Raw
screencapwas an unreliable source of truth on the live MEmu setup, which caused false "VM not running" diagnoses and blind piloting. The logging story was also incomplete because helper processes and backend calls were not being recorded.Validation
cd adb_vision && uv run pytest test_server.py test_tool_audit.py -qcd adb_vision && uv run python diagnose.py --jsoncd adb_vision && uv run python pilot.py screenshotLive checks
diagnose.pyreportedadb_connected: true,focus_matches_game: true, anddiagnosis: OK