Skip to content

feat(adb-vision): restore live piloting and audit backend helpers#57

Closed
Coldaine wants to merge 1 commit intomasterfrom
stack/adb-vision-audit-hardening
Closed

feat(adb-vision): restore live piloting and audit backend helpers#57
Coldaine wants to merge 1 commit intomasterfrom
stack/adb-vision-audit-hardening

Conversation

@Coldaine
Copy link
Owner

@Coldaine Coldaine commented Mar 5, 2026

Summary

  • restore the canonical adb_vision piloting path with reliable screenshot fallbacks and a real diagnose.py health gate
  • add structured audit logging for nested backend activity, including ADB subprocesses, backend HTTP calls, scrcpy, ffmpeg, and the PowerShell desktop fallback
  • document the canonical audited surface and add focused audit tests

Why

Raw screencap was an unreliable source of truth on the live MEmu setup, which caused false "VM not running" diagnoses and blind piloting. The logging story was also incomplete because helper processes and backend calls were not being recorded.

Validation

  • cd adb_vision && uv run pytest test_server.py test_tool_audit.py -q
  • cd adb_vision && uv run python diagnose.py --json
  • cd adb_vision && uv run python pilot.py screenshot

Live checks

  • diagnose.py reported adb_connected: true, focus_matches_game: true, and diagnosis: OK
  • live screenshot capture succeeded from the new branch after the audit changes

Copilot AI review requested due to automatic review settings March 5, 2026 21:50
@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@Coldaine
Copy link
Owner Author

Coldaine commented Mar 5, 2026

Superseded by #55, which is the cleaned-up canonical adb_vision PR for this stack.

@Coldaine Coldaine closed this Mar 5, 2026
from emulator import memuc_cli
from screenshot import take_screenshot
from tool_audit import audit_cli_call, configure, record_command

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 The Roast: r"D:\Program Files\Microvirt\MEmu\adb.exe" hardcoded into a diagnostic tool that exists to catch environment problems. There's a beautiful irony here: the first thing diagnose.py does is assume it's running on exactly the machine it was written on. On any other machine this file fails before it can tell you anything is wrong.

server.py already has a _find_adb() function that searches environment variables and known paths. diagnose.py duplicates the ADB path hardcode instead of reusing — which means if the ADB path ever changes, the diagnostic tool becomes the first thing to lie to you.

🩹 The Fix: Import _find_adb from server (or extract it to a shared adb_utils.py), then:

Suggested change
ADB = os.environ.get("ADB_EXECUTABLE") or r"D:\Program Files\Microvirt\MEmu\adb.exe"

At minimum add the env-var escape hatch so this isn't carved in stone.

📏 Severity: warning

include_metadata: bool = False,
) -> str | tuple[str, dict[str, str | int]]:
backends = _resolve_backends(method)
last_error: Exception | None = None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 The Roast: -> str | tuple[str, dict[str, str | int]] — the classic "flag argument that changes the return type" anti-pattern, wearing a Python type hint as a disguise. You've created a function that either returns a str OR a tuple depending on a boolean you pass in at call time. Every caller now needs to do type narrowing gymnastics, and mypy is going to throw a party in the error log.

🩹 The Fix: Use @overload to give each call-site a typed contract:

@overload
async def take_screenshot(..., include_metadata: Literal[False] = ...) -> str: ...
@overload  
async def take_screenshot(..., include_metadata: Literal[True]) -> tuple[str, dict[str, str | int]]: ...

Or simpler: always return the tuple and let callers unpack what they need.

📏 Severity: suggestion


# Restore previous window
[WinAPI]::SetForegroundWindow($previousWindow) | Out-Null

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 The Roast: Start-Job fires a background PowerShell job inside a script that's invoked as a one-shot subprocess from Python. When this script exits, the background job dies with it — it's not a persistent daemon, it's a fire-and-forget that gets immediately forgotten by the OS. Your 5-minute auto-delete should be called the "optimistic cleanup feature" because it will never run unless the caller holds the PS session open.

🩹 The Fix: Either use a scheduled task (Register-ScheduledJob/Register-ScheduledTask) for deferred cleanup, or just let the Python caller handle deletion after it's done consuming the file. PS Start-Job only works reliably in interactive sessions, not in -Command one-shots.

📏 Severity: warning

@kilo-code-bot
Copy link

kilo-code-bot bot commented Mar 5, 2026

Code Review Roast 🔥

Verdict: 4 Issues Found | Recommendation: Address before merge

Overview

Severity Count
🚨 critical 0
⚠️ warning 2
💡 suggestion 1
🤏 nitpick 1
Issue Details (click to expand)
File Line Roast
adb_vision/diagnose.py 29 Hardcoded Windows ADB path D:\...\adb.exe in the diagnostic tool — the tool that exists to catch environment problems fails on any non-dev machine before it can report anything useful. server.py already has _find_adb(). Use it.
adb_vision/desktop_capture.ps1 181 Start-Job auto-delete fires a background PS job from a one-shot subprocess — it dies when the script exits. The 5-minute cleanup is aspirational fiction.
adb_vision/screenshot.py 320 -> str | tuple[str, dict] return type controlled by a bool flag is the "mystery box" anti-pattern. Use @overload or always return the tuple.
docs/ROADMAP.md "⏳ Piloting mode (vision + manual recovery) not started" appears directly below "✅ Live piloting path proven on March 5, 2026." These two lines are having a disagreement that only one of them knows about.

🏆 Best part: The tool_audit.py middleware design is actually clean — the contextvars-based parent event tracking for nested ADB/HTTP/backend calls is the right architecture for tracing a multi-layer tool call, and wrapping it in FastMCP middleware so it's transparent to tool implementations is exactly how you'd want this done. I need to sit down.

💀 Worst part: diagnose.py hardcoding the ADB path is the original sin here. A diagnostic tool that assumes the environment instead of probing it is not a diagnostic tool — it's a prayer.

📊 Overall: Like a well-engineered Swiss watch stuffed into a box labeled "definitely not a watch." The audit/tracing architecture is solid, the backend fallback ladder is correct, but the operational tooling (diagnose.py, desktop_capture.ps1) has fresh-off-a-dev-machine energy that'll bite on first deployment to any other box.

Files Reviewed (10 files)
  • adb_vision/diagnose.py — 1 issue (hardcoded ADB path)
  • adb_vision/desktop_capture.ps1 — 1 issue (Start-Job lifetime)
  • adb_vision/screenshot.py — 1 issue (ambiguous return type), plus screenrecord adds ≥2s per attempt in the hot fallback path (worth a comment in the docstring)
  • adb_vision/tool_audit.py — no issues
  • adb_vision/pilot.py — no issues
  • adb_vision/server.py — no issues
  • adb_vision/test_server.py — no issues
  • adb_vision/test_tool_audit.py — no issues
  • docs/ROADMAP.md — 1 doc inconsistency (piloting status)
  • adb_vision/pyproject.toml — no issues

Fix these issues in Kilo Cloud

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Restores the adb_vision live piloting workflow by strengthening screenshot capture fallbacks and adding structured audit logging across MCP tool calls, CLI diagnostics, and nested backend helper activity.

Changes:

  • Add structured JSONL audit logging (tool_audit.py) with FastMCP middleware + child event support; integrate it into the MCP server and CLIs.
  • Extend the screenshot “auto” ladder with screenrecord (ffmpeg first-frame extraction) and improve scrcpy discovery/handling; update server contract to include screenshot metadata.
  • Add pilot.py and a more comprehensive diagnose.py (including a desktop PowerShell capture fallback) plus documentation and targeted tests.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
docs/dev/logging.md Documents the audit log and nested helper coverage.
docs/ROADMAP.md Updates roadmap items to reflect the new fallback ladder + audit logging.
docs/ARCHITECTURE.md Updates architecture diagram/notes for logging + surface guardrails.
adb_vision/tool_audit.py Implements structured audit logging + FastMCP middleware.
adb_vision/test_tool_audit.py Adds unit tests validating audit record writing and parent/child linkage.
adb_vision/test_server.py Updates tests for new fallback order, scrcpy discovery hook, and screenshot metadata.
adb_vision/server.py Integrates audit middleware, logs ADB subprocesses, and returns screenshot metadata-derived summaries.
adb_vision/screenshot.py Adds screenrecord backend, adds audit events for HTTP/backend activity, improves scrcpy capture path.
adb_vision/pyproject.toml Adds dependencies for new capture/diagnostic functionality (av, uiautomator2, pymemuc).
adb_vision/pilot.py Adds a minimal audited CLI for manual piloting of core tools.
adb_vision/diagnose.py Adds an audited health gate CLI with backend probing + desktop screenshot fallback.
adb_vision/desktop_capture.ps1 Implements desktop screenshot capture of the MEmu window for fallback diagnostics.
adb_vision/README.md Updates documented screenshot backend list and auto behavior.
TDD_IMPLEMENTATION_PLAN.md Updates TDD plan status notes to reflect audit + fallback changes.

Comment on lines 326 to 330
parser.add_argument(
"--screenshot-method",
default="auto",
choices=["auto", "droidcast", "scrcpy", "u2", "screencap"],
choices=["auto", "droidcast", "scrcpy", "u2", "screenrecord", "screencap"],
help="Default screenshot method",
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--screenshot-method is exposed as a CLI option, but its value is never used (the adb_screenshot tool always defaults to method="auto"). This makes the flag misleading; either wire it into a module-level default used by adb_screenshot when no method is provided, or remove the argument to avoid a no-op configuration option.

Copilot uses AI. Check for mistakes.
Comment on lines +30 to +34
ADB = r"D:\Program Files\Microvirt\MEmu\adb.exe"
SERIAL = "127.0.0.1:21513"
GAME_PACKAGE = "com.YoStarEN.AzurLane"
SCREEN_DIR = Path(__file__).parent / "pilot_screens"
BLACK_STDDEV_THRESHOLD = 5.0
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diagnose.py hardcodes the ADB executable path and serial (ADB = r"D:\...\adb.exe", SERIAL = "127.0.0.1:21513"). That will break on machines with different install locations/ports and makes the diagnostic gate hard to reuse in CI or on other dev setups. Prefer using the same discovery/env-var approach as server.py (e.g., $ADB_SERIAL, $ADB_EXECUTABLE or argparse flags) and defaulting to the current hardcoded values only if nothing else is configured.

Copilot uses AI. Check for mistakes.
Comment on lines +77 to +84
stdout_text = stdout.decode(errors="replace").strip()
stderr_text = stderr.decode(errors="replace").strip()
record_command(
command_name="adb.exec",
argv=argv,
duration_ms=(time.perf_counter() - started) * 1000,
status="success",
stdout=stdout_text,
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to server._adb_run, this _adb_run decodes and logs stdout as text before calling record_command. When diagnosing the screencap backend (binary PNG), this can create extremely large audit records and waste CPU/memory. Passing the raw bytes into record_command (or truncating/printability-checking before decoding) would keep audits lightweight while still capturing the byte count.

Suggested change
stdout_text = stdout.decode(errors="replace").strip()
stderr_text = stderr.decode(errors="replace").strip()
record_command(
command_name="adb.exec",
argv=argv,
duration_ms=(time.perf_counter() - started) * 1000,
status="success",
stdout=stdout_text,
# Avoid decoding large or binary stdout payloads (e.g., screencap PNG) into audit logs.
# For small, printable outputs we decode; otherwise we log a compact summary.
if stdout:
# Threshold chosen to keep audits lightweight while still logging typical text output.
if len(stdout) <= 4096:
decoded = stdout.decode(errors="replace").strip()
if decoded.isprintable():
stdout_preview = decoded
else:
stdout_preview = f"<{len(stdout)} bytes binary data>"
else:
stdout_preview = f"<{len(stdout)} bytes (truncated for logging)>"
else:
stdout_preview = ""
stderr_text = stderr.decode(errors="replace").strip()
record_command(
command_name="adb.exec",
argv=argv,
duration_ms=(time.perf_counter() - started) * 1000,
status="success",
stdout=stdout_preview,

Copilot uses AI. Check for mistakes.
Comment on lines +386 to +390
async def _capture_screenrecord(*, adb_run: AdbRunFn, serial: str, adb_exe: str) -> str:
ffmpeg_exe = shutil.which("ffmpeg")
if not ffmpeg_exe:
raise RuntimeError("ffmpeg not found in PATH; required for screenrecord fallback")

Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new screenrecord backend is a significant behavior addition (adb shell screenrecord + pull + ffmpeg decode), but it currently only has coverage via the fallback-order test that mocks _capture_screenrecord. Consider adding a focused unit test for _capture_screenrecord that mocks adb_run, _run_logged_process, and Path.read_bytes() to verify the happy path and the key failure modes (ffmpeg missing, non-PNG output, cleanup of the remote file).

Copilot uses AI. Check for mistakes.
Comment on lines +25 to +31
SCREEN_DIR = Path(__file__).parent / "pilot_screens"


def _save_image(raw: bytes, stem: str) -> str:
SCREEN_DIR.mkdir(exist_ok=True)
path = SCREEN_DIR / stem
path.write_bytes(raw)
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pilot.py writes screenshots into adb_vision/pilot_screens/ under the repo. That directory isn’t currently ignored (unlike mcp_screenshots/), so it’s easy to accidentally commit runtime artifacts. Consider either reusing the existing mcp_screenshots/ directory or adding pilot_screens/ to adb_vision/.gitignore (or the root .gitignore).

Copilot uses AI. Check for mistakes.
Comment on lines +151 to +155
stdout_text = stdout.decode(errors="replace").strip()
stderr_text = stderr.decode(errors="replace").strip()
record_command(
command_name="adb.exec",
argv=argv,
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_adb_run decodes and logs stdout/stderr as text. For commands that return binary (notably adb exec-out screencap -p used by the screenshot backend), this can produce a huge mostly-unreadable string and bloat/slow the audit log path. Consider passing the raw bytes into record_command (it already summarizes binary safely) and only decoding for known-text commands or after a size/printability check.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants