Skip to content

拆分发布验证 worker 边界#1029

Merged
ElonSG merged 2 commits into
auto-refact-devfrom
refactor/iter1028-issue-1028
Jun 20, 2026
Merged

拆分发布验证 worker 边界#1029
ElonSG merged 2 commits into
auto-refact-devfrom
refactor/iter1028-issue-1028

Conversation

@ElonSG

@ElonSG ElonSG commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Changed files

  • 新增 helper-private publish-verification-worker CLI 入口,由 publish_verification.py 直接执行单个发布验证 job。
  • publish_implementation_output 的 parent tick 现在只启动 dedicated worker,不再经由 wakeup-runner --run-one-publish-ratchet
  • 更新 wakeup-runner、SKILL 和 runtime-exception 文档,明确 worker 只运行 host-owned BUILD_CMD/TEST_CMD 并写验证 evidence/log。

Test results

  • check-degradation --static passed.
  • Targeted unittest files passed.
  • Full python3 -m unittest discover -s skills/consensus-loop/scripts -p "test_*.py" passed: 2561 tests, 1 skipped.

Deviations

  • 无 scope extension。
  • refactor self-doc: not applicable (HOST_REFACTOR_COMMENT_POLICY=none)

Closes #1028

⟦AI:AUTO-LOOP⟧

@ElonSG ElonSG added crnd:lifecycle:managed loop-managed item crnd:phase:reviewing review-gate reviewers in flight crnd:human:auto auto-advancing, no human needed labels Jun 20, 2026
@ElonSG

ElonSG commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

🤖 Quality review:comment

TL;DR

  • What this is: 这是 PR 拆分发布验证 worker 边界 #1029 的 code quality 独立审阅,范围只看可读性、命名、简单性、死代码和无关改动。
  • Current state or conclusion: 结论是 comment;实现路径可达且范围聚焦,但新 worker 边界里还残留旧的 run_one_publish_ratchet 名称。
  • What the maintainer should do OR what the controller does next: 建议把该函数和测试调用改成 run_one_publish_verification_job 一类的意图型名称;这是 advisory,不是 merge blocker。

审阅证据

publish-verification-worker 是这次拆出来的新命令,cli.py:90-94 注册了它,publish_verification.py:447-459 也启动了它;文档在 SKILL.md:385authorizations/runtime-exceptions.md:233 同样用 publish verification worker/job 的语义描述边界。问题是实际执行函数仍叫 run_one_publish_ratchet,位置在 publish_verification.py:216,而 publish_verification.py:629 的新 CLI 入口仍转回这个旧名字。ratchet 现在既不是公开命令,也不是文档里的边界词,读代码时会让人误以为还有一个旧的 ratchet 概念层。

test_publish_verification.py:116:140:150:211:236:256:281 继续直接调用或 patch run_one_publish_ratchet,所以这个旧名已经被测试固化。相对地,test_wakeup_runner.py:6608-6616 正确验证旧 wakeup-runner --run-one-publish-ratchet 模式会被拒绝,因此我没有把它判成死代码或边界泄漏,只判成命名可读性问题。

建议改法很小:把 run_one_publish_ratchet 改成 run_one_publish_verification_jobrun_publish_verification_worker,同步更新 main()test_publish_verification.py,并保留现有 wakeup-runner 旧 flag 拒绝测试作为 legacy guard。

Reasoning discipline note:参考框架是 intent-revealing naming 和单一边界语言。已验证的新边界名字是 publish-verification-worker,已验证旧函数名只剩在 publish_verification.py 和测试里。Ugly defect: approve 会把一个已经被新公共边界替代的旧隐喻继续固化。Beautiful form: 公开命令、worker 函数、测试都使用 publish verification worker/job 词汇。Ugly defect: reject 会把一个可达、聚焦、无生命周期扩权的命名问题夸大成合并阻断。Beautiful form: advisory comment 加明确 rename。

📎 Full raw codex output (archival)
---
pr: 1029
role: quality
head_sha: ae0c410c19474326068af1dcd7cf0fc8c2f90045
verdict: comment
---

## Verdict
Comment: the split is focused and reachable, but the worker implementation still exposes the old `run_one_publish_ratchet` name, which weakens the newly documented `publish-verification-worker` boundary.

## Evidence
- `skills/consensus-loop/scripts/codex_refactor_loop/publish_verification.py:216` defines `run_one_publish_ratchet`, and `publish_verification.py:629` wires the new `consensus-rnd-cli publish-verification-worker` entrypoint back through that old name. This conflicts with the PR's readable public vocabulary: `cli.py:90-94` registers `publish-verification-worker`, `publish_verification.py:447-459` starts that worker, and `SKILL.md:385` / `authorizations/runtime-exceptions.md:233` describe the helper-private publish verification child without ratchet terminology.
- `skills/consensus-loop/scripts/test_publish_verification.py:116`, `:140`, `:150`, `:211`, `:236`, `:256`, and `:281` continue to call or patch `run_one_publish_ratchet`, so the stale name is now reinforced by tests instead of being isolated legacy compatibility. The old CLI flag is correctly rejected at `test_wakeup_runner.py:6608-6616`, so this is not dead code or lifecycle leakage; it is a naming/readability defect.
- Reasoning-discipline note: reference frame is intent-revealing naming plus single-source boundary language. Known-good shape: once the boundary is named `publish-verification-worker`, internal exported entrypoints and tests should use the same domain name unless an explicit legacy adapter remains. Candidate approve: Ugly defect: it would bless a lying internal name after the public/API name was fixed. Beautiful form: public command, helper function, and tests all say "publish verification worker/job" consistently. Candidate reject: Ugly defect: it would overstate a naming inconsistency as a functional blocker despite verified reachability and focused scope. Beautiful form: advisory comment with a concrete rename. Verified premises: `gh pr view 1029` reports `HOST_REFACTOR_COMMENT_POLICY=none`, the diff is scoped to consensus-loop publish verification/wakeup-runner docs and tests, and `git grep` on the PR head finds `run_one_publish_ratchet` only in `publish_verification.py` and tests plus explicit negative legacy assertions.

## What would change your verdict (only if comment or reject)
Rename `run_one_publish_ratchet` to an intent-revealing worker name such as `run_one_publish_verification_job` or `run_publish_verification_worker`, update `main()` and `test_publish_verification.py`, and keep the existing `wakeup-runner --run-one-publish-ratchet` rejection test as the legacy guard.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:1029:quality:comment

review_round: 1
head_sha: ae0c410
REVIEW_DONE:1029:quality:comment
⟦AI:AUTO-LOOP⟧

@ElonSG

ElonSG commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

🤖 架构复核通过:发布验证 worker 边界未新增生命周期权限

TL;DR

  • What this is: PR 拆分发布验证 worker 边界 #1029 的 architect 角度独立复核。
  • Current state or conclusion: 结论是 approve,没有发现 CLAUDE.md / AGENTS.md 架构退化。
  • What the maintainer should do OR what the controller does next: 控制器可把此 architect 结论纳入本轮 review truth table。

架构证据

publish-verification-worker 是单个发布验证 job 的 helper-private 子进程入口;wakeup-runner 是消费 wakeup-plan 闭合动作投影的 #396 runner;VERIFIED receipt 是发布验证完成后供后续 finalizer 复验的结果文件。这里的关键判断是:PR 把慢速 BUILD_CMD / TEST_CMD 从 runner tick 里拆出去,但没有把 push、PR、label、release 等生命周期权限交给新 worker。

  • skills/consensus-loop/scripts/codex_refactor_loop/cli.py:90 注册 publish-verification-worker,authority 只有 read-state/read-git/write-state/write-log,没有 PR、issue、label、tag、release 权限;这对应 CLAUDE.md:48 的 narrow allowlist / no lifecycle authority default。
  • skills/consensus-loop/scripts/codex_refactor_loop/publish_verification.py:216 仍然只运行一个既有 job;publish_verification.py:249 只执行 host-owned BUILD_CMD / TEST_CMD 并写 receipt evidence/logs,没有直接打开 PR 或推动生命周期状态。
  • skills/consensus-loop/SKILL.md:385skills/consensus-loop/authorizations/runtime-exceptions.md:233 明确 publish_verification.py 是唯一 helper-private job/receipt owner,并写明 no GitHub/git lifecycle、no issue/PR/label/tag/release authority、no host production SSOT authority。
  • skills/consensus-loop/scripts/codex_refactor_loop/wakeup_runner.py:2856 删除旧的 hidden --run-one-publish-ratchet 模式;skills/consensus-loop/scripts/test_wakeup_runner.py:6608 锁住旧模式应被拒绝,符合删除优先,不保留兼容空壳。
  • skills/consensus-loop/scripts/test_cli_command_router.py:188 锁住 worker command 的 helper-private + no-lifecycle authority;skills/consensus-loop/scripts/test_publish_verification.py:106 覆盖新 worker CLI entrypoint。架构边界有行为测试和 source-regression 锚点支撑。

推理纪律 note:参考框架是 capability-based least privilege + single-writer durable receipt。已验证的事实来自实际 diff、PR/issue 元数据、CLAUDE.md/AGENTS.md 和 changed file contents;没有使用 ASSUMED-UNVERIFIED 前提。候选“继续隐藏在 wakeup-runner 子模式里”的 Ugly defect: hidden intent and bad coupling to scheduler tick;Beautiful form: separate slow verification with explicit worker boundary。候选“做成 generic executor/public lifecycle command”的 Ugly defect: leaked abstraction and reusable authority surface;Beautiful form: named helper-private command with closed authority tokens。当前方案仍有一个 CLI-shaped 入口这一点容易被误解为 public authority,但文档、authority tokens、旧模式删除和测试把边界收窄到了 helper-private。

📎 Full raw codex output (archival)
---
pr: 1029
role: architect
head_sha: ae0c410c19474326068af1dcd7cf0fc8c2f90045
verdict: approve
---

## Verdict
approve - no architecture compliance concerns; the PR narrows publish verification execution out of the `wakeup-runner` tick without adding lifecycle authority.

## Evidence
- No reject findings.
- Scope matches issue #1028 and PR #1029: the diff only changes `skills/consensus-loop` publish-verification worker wiring, its #396 documentation mirror, and matching tests; no SCOPE_EXTEND was needed.
- `CLAUDE.md:48` requires controller-runtime exceptions to stay narrow with no lifecycle authority by default; `skills/consensus-loop/scripts/codex_refactor_loop/cli.py:90` registers `publish-verification-worker` with only `("read-state", "read-git", "write-state", "write-log")`, not PR/issue/label/tag/release tokens.
- `CLAUDE.md:78` says daemons/helpers are narrow allowlist executors and implement/fix workers must not open PRs, merge, or close issues; `skills/consensus-loop/scripts/codex_refactor_loop/publish_verification.py:216` runs one existing publish-verification job, and `skills/consensus-loop/scripts/codex_refactor_loop/publish_verification.py:249` only runs the fixed `BUILD_CMD`/`TEST_CMD` sequence before writing receipt state.
- `CLAUDE.md:219` says `.refactor-loop/` must not become host production configuration or ledger SSOT; `skills/consensus-loop/SKILL.md:385` and `skills/consensus-loop/authorizations/runtime-exceptions.md:233` explicitly keep `publish_verification.py` as the helper-private receipt owner, forbid host production SSOT authority, and keep final push/PR effects behind revalidated `VERIFIED` receipt finalization.
- `CLAUDE.md:54` and `CLAUDE.md:136` require deletion-first instead of compat shells; `skills/consensus-loop/scripts/codex_refactor_loop/wakeup_runner.py:2856` removes the hidden `--run-one-publish-ratchet` mode, and `skills/consensus-loop/scripts/test_wakeup_runner.py:6608` locks rejection of the legacy mode.
- `CLAUDE.md:88`, `CLAUDE.md:138`, and `CLAUDE.md:140` require behavior/source-regression coverage for runtime boundaries; `skills/consensus-loop/scripts/test_cli_command_router.py:188` locks no-lifecycle command authority, `skills/consensus-loop/scripts/test_publish_verification.py:106` covers the new worker CLI entrypoint, and the SKILL/runtime-exception anchor tests were updated to the new helper-private child contract.
- Old/New pattern comment policy: prompt normalized to `none`; the diff adds no new `Old pattern` / `New principle` source comments.
- Reference-frame note: mature frame is capability-based least privilege plus single-writer durable receipt design. Known-good shape is a small helper-private worker that reads immutable job state, runs the host-owned gate, writes one receipt, and leaves lifecycle mutations to the existing finalizer. Candidate "keep hidden wakeup-runner ratchet": Ugly defect: hidden intent and bad coupling to the scheduler tick; Beautiful form: separate slow verification from the tick with an explicit worker boundary. Candidate "generic executor/public lifecycle command": Ugly defect: leaked abstraction and reusable authority surface; Beautiful form: no command bus, only a named helper-private command with closed authority tokens. Chosen candidate: Ugly defect: it still exposes a CLI-shaped entrypoint, so the boundary could be mistaken for public authority; Beautiful form: documented helper-private status, no lifecycle tokens, legacy mode deletion, and behavior/source-regression tests. Verified premises: actual diff, PR/issue metadata, `CLAUDE.md`/`AGENTS.md`, and changed file contents were inspected; no `ASSUMED-UNVERIFIED` premise is needed for this verdict.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:1029:architect:approve

review_round: r1
head_sha: ae0c410
REVIEW_DONE:1029:architect:approve
⟦AI:AUTO-LOOP⟧

@ElonSG

ElonSG commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

🤖 测试覆盖审查:reject

TL;DR

  • What this is: 这是 PR 1029 的 tests 角色独立审查。
  • Current state or conclusion: 结论是 reject,因为新增 publish-verification-worker 命令缺少 router 级行为覆盖。
  • What the maintainer should do OR what the controller does next: 请补一个最小 router dispatch 测试,证明该命令会把 job_dir 转发给正确 handler。

证据

我只按测试质量角度看了三点:新增生产入口、对应测试、是否有 skip/固定等待/断言放松。没有发现新增 skip、manual marker、固定等待 pacing,现有新增测试也不是纯行数测试。

阻塞点是新增命令边界没有被真正跑通:skills/consensus-loop/scripts/codex_refactor_loop/cli.py:90 新增 publish-verification-worker,而 skills/consensus-loop/scripts/codex_refactor_loop/publish_verification.py:452 的父流程实际会启动 consensus-rnd-cli publish-verification-worker <job_dir>。目前 skills/consensus-loop/scripts/test_cli_command_router.py:188 只检查命令存在、description、authority token,没有证明 RuntimeCommandRouter.run("publish-verification-worker", [job_dir]) 会调到 publish verification worker handler。

skills/consensus-loop/scripts/test_publish_verification.py:136 已经覆盖了模块入口 publish_verification.main([job_dir]) 会把 exact job_dir 传给 run_one_publish_ratchet,这很好,但它不覆盖 CLI router 这一层。这里的风险是 handler 绑定错了、argv 没转发、或命令名注册到错误 handler 时,现有测试仍可能通过,而生产子进程路径会断。

建议补一个最小测试:在 test_cli_command_router.py 里 patch publish-verification-worker 的 command handler 为 mock,运行 RuntimeCommandRouter(...).run("publish-verification-worker", [str(job_dir)]),断言 return code 和 handler.assert_called_once_with([str(job_dir)])。这和文件里已有 peek / wakeup-plan / labels 的 router 转发测试风格一致。

Reasoning discipline note: reference frame 是 command-router adapter regression testing。known-good shape 是新命令 surface 同时有 registration/authority 断言和一条 argv dispatch 行为测试。已验证 premise:diff 新增了 COMMANDS["publish-verification-worker"],scheduler 改为 spawn 该 CLI 命令,并新增了 direct module-main coverage,但没有该 command 的 router dispatch 测试。Candidate approve 的 ugly defect 是把 metadata assertions 当成 executable dispatch coverage;beautiful form 是在 adapter boundary 加一条小测试。Candidate comment 的 ugly defect 是 broken handler binding 会让生产 child path 失败;beautiful form 是在缺失 net-new command path 覆盖前 reject。


📎 Full raw codex output (archival)
---
pr: 1029
role: tests
head_sha: ae0c410c19474326068af1dcd7cf0fc8c2f90045
verdict: reject
---

## Verdict
Reject: the new helper-private CLI command has metadata and module-entrypoint coverage, but no test proves the command router dispatches `publish-verification-worker <job_dir>` to the publish verification worker.

## Evidence
- `skills/consensus-loop/scripts/codex_refactor_loop/cli.py:90` adds the net-new `publish-verification-worker` command that the parent process launches at `skills/consensus-loop/scripts/codex_refactor_loop/publish_verification.py:452`, but `skills/consensus-loop/scripts/test_cli_command_router.py:188` only asserts command metadata and authority tokens. It does not assert `RuntimeCommandRouter.run("publish-verification-worker", [job_dir])` calls the registered publish verification handler with the exact job directory, so a wrong handler binding would still pass while the production child launch path is broken.

## What would change your verdict (only if comment or reject)
Add a router-level behavior test in `skills/consensus-loop/scripts/test_cli_command_router.py` that patches the `publish-verification-worker` command handler with a mock, runs `RuntimeCommandRouter(...).run("publish-verification-worker", [str(job_dir)])`, and asserts the mock is called exactly once with `[str(job_dir)]`. The existing `test_worker_cli_entrypoint_runs_exact_job_dir` can remain as the module-entrypoint test.

## Reasoning discipline note
Reference frame: command-router adapter regression testing. Known-good shape: a new command surface needs registration/authority assertions plus one behavior test proving argv is routed to the intended handler. Verified premise: the diff adds `COMMANDS["publish-verification-worker"]`, changes the scheduler to spawn that CLI command, and adds direct module-main coverage, but no router dispatch test for this command exists. Candidate approve is ugly because it treats metadata assertions as executable dispatch coverage; beautiful form is one small router test at the adapter boundary. Candidate comment is ugly because a broken handler binding would make the production child path fail; beautiful form is reject until the missing net-new command path is covered.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:1029:tests:reject

review_round: 1
head_sha: ae0c410
REVIEW_DONE:1029:tests:reject
⟦AI:AUTO-LOOP⟧

@ElonSG

ElonSG commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

🤖 修复 PR 1029 第 1 轮阻塞评审

TL;DR

  • What this is: 这是 fix-codex 对 PR 1029 第 1 轮 reject 证据的修复回报。
  • Current state or conclusion: 唯一阻塞项来自 tests reviewer,已补上 publish-verification-worker 命令路由到注册 handler 的行为测试。
  • What the maintainer should do OR what the controller does next: controller 推送/重派下一轮 review gate;quality reviewer 的命名建议仍是 advisory comment。

修复细节

本轮采用的参考框架是 command-router adapter regression testing:新增命令面不仅要有注册和权限 token 断言,还要有一个行为测试证明 argv 会通过 RuntimeCommandRouter.run 送到预期 handler。

阻塞证据是 skills/consensus-loop/scripts/codex_refactor_loop/cli.py:90 新增 publish-verification-worker,但原测试只覆盖 metadata;如果 handler 绑定错了,生产 child launch 路径会坏而测试仍可能通过。修复在 skills/consensus-loop/scripts/test_cli_command_router.py:207 增加测试:临时替换 COMMANDS["publish-verification-worker"].handler 为 mock,调用 RuntimeCommandRouter.run("publish-verification-worker", [str(job_dir)]),并断言 mock 精确收到 [str(job_dir)] 一次。

验证结果:

项目 结果 说明
static degradation pass check-degradation --static 输出 skill-degradation: ok
touched test file pass test_cli_command_router.py 跑 40 tests OK
script unittest discovery pass discover -s skills/consensus-loop/scripts -p "test_*.py" 跑 2562 tests OK,skipped=1

quality reviewer 提到 run_one_publish_ratchet 的内部命名仍旧;该项 verdict 是 comment,按本角色规则只作为上下文记录,没有作为 mandatory fix demand 扩大本轮改动。

📎 Full raw codex output (archival)
# Fix report for PR 1029 round 1

## Applied
- (A) `skills/consensus-loop/scripts/test_cli_command_router.py:207`: added a router-level behavior test that patches the registered `publish-verification-worker` handler, runs `RuntimeCommandRouter.run("publish-verification-worker", [str(job_dir)])`, and asserts the handler receives exactly `[str(job_dir)]` once. This addresses reviewer:tests's evidence #1.

## Rejected as false positive
- None.

## Reviewer comments read
- reviewer:architect approved; no blocking demands. Their evidence said the worker split keeps publish verification outside the `wakeup-runner` tick without adding lifecycle authority.
- reviewer:quality commented that `run_one_publish_ratchet` remains stale internal vocabulary and suggested renaming it. This was not treated as mandatory because the verdict was `comment`, not `reject`.

## Blocked (cannot fix this round)
- None.

## Build status
- build: pass (`PATH="/opt/homebrew/bin:$PATH" /opt/homebrew/bin/python3 skills/consensus-loop/scripts/consensus-rnd-cli check-degradation --static` -> `skill-degradation: ok`)
- tests: pass (`PATH="/opt/homebrew/bin:$PATH" /opt/homebrew/bin/python3 -m unittest skills/consensus-loop/scripts/test_cli_command_router.py` -> 40 tests OK; `PATH="/opt/homebrew/bin:$PATH" /opt/homebrew/bin/python3 -m unittest discover -s skills/consensus-loop/scripts -p "test_*.py"` -> 2562 tests OK, skipped=1)
- review-thread-completion: not applicable; `.refactor-loop/state/review-thread-completion/pr1029.json` was absent.
- reference-frame note: command-router adapter regression testing; a new command surface needs registration/authority checks plus a behavior test proving argv reaches the intended handler.
- cluster source: prompt audit path was empty and implement summary path was empty; derived cluster from PR branch `refactor/iter1028-issue-1028`, `.refactor-loop/runs/phase9-issue1028-r1-judge.md`, and `.refactor-loop/runs/implement-issue-1028.md`.
- refactor self-doc: not applicable (HOST_REFACTOR_COMMENT_POLICY=none).

## Recommendation for next round
- expect unanimous. The only reject demand was concrete, in-scope, applied, and verified; the remaining quality note is advisory.

⟦AI:AUTO-LOOP⟧
FIX_DONE:1029:round-1:applied-1:rejected-0:blocked-0

⟦AI:AUTO-LOOP⟧

@ElonSG

ElonSG commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

🤖 测试审查结论:approve

TL;DR

  • What this is: PR 1029 的 tests 角色独立审查,聚焦测试覆盖和测试质量。
  • Current state or conclusion: 这次 publish verification worker 边界拆分有对应行为测试和授权边界 source-regression 测试,未发现必须修复的测试缺口。
  • What the maintainer should do OR what the controller does next: controller 可以把本角色视为 approve 信号继续走 review gate。

测试证据

本轮重点看了 wakeup-runner --run-one-publish-ratchet 被移除后,新的 publish-verification-worker 是否有足够的行为覆盖,而不是只增加文档或字符串断言。

  • skills/consensus-loop/scripts/test_publish_verification.py:55test_parent_writes_immutable_request_pins_private_ref_and_starts_hidden_child 验证 parent 仍然只写 immutable request、pin private ref,并且隐藏子进程现在启动 publish-verification-worker <job_dir>;这覆盖了生产代码里 _start_hidden_child() 的命令变更。
  • skills/consensus-loop/scripts/test_publish_verification.py:136test_worker_cli_entrypoint_runs_exact_job_dir 验证新的 publish_verification.main() 会把精确 job dir 转交给 run_one_publish_ratchet;这覆盖了新 worker entrypoint 的核心行为。
  • skills/consensus-loop/scripts/test_cli_command_router.py:188test_publish_verification_worker_is_helper_private_command_with_no_lifecycle_authority 验证新命令存在、描述为 helper-private、authority 只有 read-state/read-git/write-state/write-log,并且不包含 lifecycle token;这对应本 PR 的边界目标。
  • skills/consensus-loop/scripts/test_cli_command_router.py:207test_publish_verification_worker_command_forwards_argv_to_registered_handler 走真实 router dispatch,确认 job dir 参数会传到注册 handler;这不是单纯 mock 调用计数,而是覆盖 command router 行为。
  • skills/consensus-loop/scripts/test_wakeup_runner.py:6609test_wakeup_runner_rejects_publish_ratchet_mode 验证 legacy --run-one-publish-ratchet 参数被 argparse 拒绝,并且不会进入 runner tick;这覆盖了旧入口移除的回归边界。
  • skills/consensus-loop/scripts/test_skill_reference_anchors.py:1928skills/consensus-loop/scripts/test_runtime_exception_authorization_sources.py:626 锁住 no-lifecycle/no-public-ratchet 文档和授权镜像;这些是边界一致性测试,且没有替代上面的行为测试。

我也检查了 diff 中没有新增 fixed-duration wait pacing、skip/disable/manual-test marker,且没有把既有断言弱化成 smoke assertion。

验证命令已在 PR head worktree d12c2e2bd7d1ed6d48391b4d8732dc578fe1002c 上运行通过:

python3 -m unittest skills/consensus-loop/scripts/test_cli_command_router.py
# Ran 40 tests OK
python3 -m unittest skills/consensus-loop/scripts/test_publish_verification.py
# Ran 17 tests OK
python3 -m unittest skills/consensus-loop/scripts/test_wakeup_runner.py
# Ran 210 tests OK
python3 -m unittest skills/consensus-loop/scripts/test_runtime_exception_authorization_sources.py skills/consensus-loop/scripts/test_skill_reference_anchors.py
# Ran 125 tests OK

推理纪律说明:reference frame 是 command router 与 helper-private worker surface 的 behavior-first boundary regression testing。已知好形态是每个可执行新路径都有一个行为测试,授权/事实源约束再用 source-regression 锁住;本 PR 符合。Ugly defect: reject 路径会把 source-regression 的存在误判成缺少行为测试,忽略 parent spawn、router dispatch、worker entrypoint、legacy flag rejection 都已有可观察断言;Beautiful form: 只有真实新执行路径没有行为断言时才 reject。Ugly defect: comment 路径会在没有定位缺口时制造 nice-to-have;Beautiful form: advisory 只绑定具体未覆盖 branch 或弱化断言。Ugly defect: approve 路径若不跑测试会变成 unverifiable premise;Beautiful form: 对 exact head 文件、diff 和 unittest 结果逐项核验后 approve。本 verdict 没有依赖 ASSUMED-UNVERIFIED 前提。

📎 Full raw codex output (archival)
---
pr: 1029
role: tests
head_sha: d12c2e2bd7d1ed6d48391b4d8732dc578fe1002c
verdict: approve
---

## Verdict
Approve: test coverage and test quality are adequate for the publish verification worker boundary split.

## Evidence
- [skills/consensus-loop/scripts/test_publish_verification.py](/Users/auric/consensus-rnd/.worktrees/iter1028-issue-1028/skills/consensus-loop/scripts/test_publish_verification.py:55) `test_parent_writes_immutable_request_pins_private_ref_and_starts_hidden_child` asserts the parent still writes the immutable request, pins the private ref, does not run host commands inline, and now spawns `publish-verification-worker <job_dir>`.
- [skills/consensus-loop/scripts/test_publish_verification.py](/Users/auric/consensus-rnd/.worktrees/iter1028-issue-1028/skills/consensus-loop/scripts/test_publish_verification.py:136) `test_worker_cli_entrypoint_runs_exact_job_dir` covers the new public Python entrypoint by asserting it forwards the exact job directory to `run_one_publish_ratchet`.
- [skills/consensus-loop/scripts/test_cli_command_router.py](/Users/auric/consensus-rnd/.worktrees/iter1028-issue-1028/skills/consensus-loop/scripts/test_cli_command_router.py:188) `test_publish_verification_worker_is_helper_private_command_with_no_lifecycle_authority` covers the new router registration and authority boundary, including absence of lifecycle tokens and absence of legacy publish-ratchet command names.
- [skills/consensus-loop/scripts/test_cli_command_router.py](/Users/auric/consensus-rnd/.worktrees/iter1028-issue-1028/skills/consensus-loop/scripts/test_cli_command_router.py:207) `test_publish_verification_worker_command_forwards_argv_to_registered_handler` exercises real router dispatch to the registered command handler with the job directory argument.
- [skills/consensus-loop/scripts/test_wakeup_runner.py](/Users/auric/consensus-rnd/.worktrees/iter1028-issue-1028/skills/consensus-loop/scripts/test_wakeup_runner.py:6609) `test_wakeup_runner_rejects_publish_ratchet_mode` covers the removed `wakeup-runner --run-one-publish-ratchet` path by asserting argparse rejects the legacy flag before entering a runner tick.
- [skills/consensus-loop/scripts/test_skill_reference_anchors.py](/Users/auric/consensus-rnd/.worktrees/iter1028-issue-1028/skills/consensus-loop/scripts/test_skill_reference_anchors.py:1928) and [skills/consensus-loop/scripts/test_runtime_exception_authorization_sources.py](/Users/auric/consensus-rnd/.worktrees/iter1028-issue-1028/skills/consensus-loop/scripts/test_runtime_exception_authorization_sources.py:626) provide source-regression coverage for the newly documented no-lifecycle/no-public-ratchet boundary; these supplement, rather than replace, the behavior tests above.
- No new fixed-duration wait pacing, host-framework skip/disable/manual-test marker, or weakened assertion was introduced in the PR diff. `IMPLEMENT_SUMMARY_PATH` and `HOST_SCHEMA_POLICY` were empty, so I inferred policy from `CLAUDE.md`/`AGENTS.md` and the diff.
- Verification on the exact PR head worktree `d12c2e2bd7d1ed6d48391b4d8732dc578fe1002c`: `python3 -m unittest skills/consensus-loop/scripts/test_cli_command_router.py` ran 40 tests OK; `python3 -m unittest skills/consensus-loop/scripts/test_publish_verification.py` ran 17 tests OK; `python3 -m unittest skills/consensus-loop/scripts/test_wakeup_runner.py` ran 210 tests OK; `python3 -m unittest skills/consensus-loop/scripts/test_runtime_exception_authorization_sources.py skills/consensus-loop/scripts/test_skill_reference_anchors.py` ran 125 tests OK.
- Reasoning-discipline note: reference frame is behavior-first boundary regression testing for command routers and helper-private worker surfaces. Known-good shape is one behavior test for each observable new path plus source-regression anchors only for authorization/source-of-truth constraints; this PR aligns. Candidate reject was ugly because it would treat source-regression additions as insufficient despite verified behavior tests at the parent spawn, router dispatch, worker entrypoint, and legacy-flag rejection surfaces; beautiful form is to reject only if a net-new executable path lacks a behavior assertion. Candidate comment was ugly because it would invent nice-to-have coverage without a locatable gap; beautiful form is to keep advisory feedback tied to an uncovered branch or weakened assertion. Candidate approve is acceptable because all factual premises above are verified against the PR head files, diff, and test commands; no `ASSUMED-UNVERIFIED` premise is used for the verdict.

## What would change your verdict (only if comment or reject)
N/A

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:1029:tests:approve

review_round: 2
head_sha: d12c2e2
REVIEW_DONE:1029:tests:approve
⟦AI:AUTO-LOOP⟧

@ElonSG

ElonSG commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

🤖 架构合规评审通过

TL;DR

  • What this is: 这是 PR 1029 的 architect 角度独立评审。
  • Current state or conclusion: 结论是 approve,未发现新增或加重的 CLAUDE/AGENTS 架构违规。
  • What the maintainer should do OR what the controller does next: controller 可以把该角色结果计入本轮 review gate。

架构合规证据

我按 CLAUDE.md / AGENTS.md、三点 diff、PR 说明和 issue #1028 检查了这次变更。核心结论是:新的 publish-verification-worker 是 helper-private 的本地验证 worker,用来把慢的 BUILD_CMD / TEST_CMDwakeup-runner tick 里拆出去;它没有声明 GitHub/git lifecycle authority,也没有引入第二套发布验证事实源。

关键证据:

  • skills/consensus-loop/scripts/codex_refactor_loop/cli.py:90-93 注册 publish-verification-worker,描述为 helper-private,authority 只有 read-stateread-gitwrite-statewrite-log。这符合 CLAUDE.md:48 对 controller-runtime 例外的窄 allowlist 要求。
  • skills/consensus-loop/scripts/codex_refactor_loop/publish_verification.py:447-459 的 parent tick 只启动 consensus-rnd-cli publish-verification-worker <job_dir>,不再复用 wakeup-runner --run-one-publish-ratchet。这把慢验证 worker 从生命周期 runner CLI 中拆出来。
  • skills/consensus-loop/scripts/codex_refactor_loop/publish_verification.py:625-629 的新 entrypoint 只把一个 job_dir 交给既有 run_one_publish_ratchet,没有新增通用 executor 或任意命令入口。
  • skills/consensus-loop/SKILL.md:385skills/consensus-loop/authorizations/runtime-exceptions.md:233 明确 publish_verification.py 仍是唯一 helper-private job/receipt owner;worker 只运行 host-owned BUILD_CMD/TEST_CMD 并写 verification evidence/logs;同时禁止 public lifecycle CLI、generic command executor、GitHub/git lifecycle、issue/PR/label/tag/release authority、host production SSOT authority 和第二验证事实源。
  • skills/consensus-loop/scripts/test_cli_command_router.py:188-205skills/consensus-loop/scripts/test_publish_verification.py:93-96skills/consensus-loop/scripts/test_publish_verification.py:136-144skills/consensus-loop/scripts/test_wakeup_runner.py:6608-6616 覆盖了 router authority、parent child command、worker entrypoint 和旧 hidden runner flag 拒绝路径。

推理纪律记录:参考框架是 narrow authority surfaces / single-responsibility helper boundary。已知好形态是一条 owner-local helper-private 命令读取 durable job request,只跑验证命令、写 receipt,把 lifecycle finalization 留在既有 owner。保留 wakeup-runner --run-one-publish-ratchet 的 Ugly defect: 慢验证 worker 泄漏进 lifecycle runner CLI,使 runner 同时承担 dispatcher 和 verification worker。Beautiful form: 删除 hidden mode,改成命名 helper-private command。新增 publish-verification-worker 的 Ugly defect: 它仍是可调用本地命令,若无契约和测试容易变成 public lifecycle CLI。Beautiful form: 明确非 lifecycle authority,记录 fact source / forbidden surface,并用 router 与 behavior tests 锁住。事实均已用 diff 和文件内容验证,没有依赖 ASSUMED-UNVERIFIED 前提。


📎 Full raw codex output (archival)
---
pr: 1029
role: architect
head_sha: d12c2e2bd7d1ed6d48391b4d8732dc578fe1002c
verdict: approve
---

## Verdict
approve - no architectural compliance concern; the PR narrows publish verification into a helper-private worker without adding GitHub/git lifecycle authority or a second verification fact source.

## Evidence
- `CLAUDE.md:48` requires controller-runtime exceptions to stay narrow, owner-gated, and not become a generic lifecycle actor; the PR's new command at `skills/consensus-loop/scripts/codex_refactor_loop/cli.py:90` is named `publish-verification-worker`, described as helper-private, and declares only `read-state`, `read-git`, `write-state`, and `write-log` authority at `skills/consensus-loop/scripts/codex_refactor_loop/cli.py:92-93`.
- `CLAUDE.md:78` says daemon/controller helper authority remains an explicit allowlist and implement/fix workers must not commit, push, open PR, merge, or close issue/PR. The new child launch at `skills/consensus-loop/scripts/codex_refactor_loop/publish_verification.py:447-459` starts only `consensus-rnd-cli publish-verification-worker <job_dir>`, while the worker entrypoint at `skills/consensus-loop/scripts/codex_refactor_loop/publish_verification.py:625-629` forwards to the existing publish-verification job runner rather than adding a generic executor.
- `CLAUDE.md:89` requires any new runtime surface to state allowed actions, forbidden actions, fact source, and verification. The updated wakeup-runner contract at `skills/consensus-loop/SKILL.md:385` names `publish_verification.py` as the unique helper-private job/receipt owner, limits the child to host-owned `BUILD_CMD`/`TEST_CMD`, says it writes verification evidence/logs, and forbids public lifecycle CLI, generic command executor, GitHub/git lifecycle, issue/PR/label/tag/release authority, host production SSOT authority, and a second verification fact source.
- `CLAUDE.md:138` requires controller-runtime exceptions to have mechanical verification. The diff adds router and behavior/source checks: `skills/consensus-loop/scripts/test_cli_command_router.py:188-205` asserts helper-private command authority and removal of the legacy hidden wakeup-runner flag; `skills/consensus-loop/scripts/test_publish_verification.py:93-96` asserts the parent launches `publish-verification-worker`; `skills/consensus-loop/scripts/test_publish_verification.py:136-144` asserts the worker CLI forwards exactly one job dir; `skills/consensus-loop/scripts/test_wakeup_runner.py:6608-6616` asserts legacy `--run-one-publish-ratchet` is rejected.
- Scope evidence: issue `#1028` requests decoupling slow publish verification from the wakeup-runner tick by running it out-of-band with durable result artifacts. The PR changes only `skills/consensus-loop` docs, command routing, publish verification, wakeup-runner legacy mode, and tests; no external repo references, host config, branch topology, `.refactor-loop/host.env` production facts, or compatibility shim were introduced.
- Reasoning-discipline note: reference frame is narrow authority surfaces / single-responsibility helper boundary. Known-good shape is one owner-local helper-private command that reads the durable job request, runs only the verification commands, writes a receipt, and leaves lifecycle finalization in the existing owner. Candidate A, keep hidden `wakeup-runner --run-one-publish-ratchet`: Ugly defect: leaked a slow verification worker into the lifecycle runner CLI, making the runner both dispatcher and verification worker. Beautiful form: remove the hidden mode and put verification execution behind a named helper-private command. Candidate B, add `publish-verification-worker`: Ugly defect: it is still an invocable local command, so it needs an explicit authority contract and tests to avoid becoming a public lifecycle CLI. Beautiful form: declare narrow non-lifecycle authority, document fact source/forbidden surface, and lock it with router plus behavior tests. Verified premises: diff evidence confirms the new command has no declared lifecycle tokens, the child runs `BUILD_CMD`/`TEST_CMD` through `run_fixed_host_command`, writes `result.json`/logs, and wakeup-runner no longer accepts the legacy publish-ratchet flag; no ASSUMED-UNVERIFIED premise is needed for the verdict.

## What would change your verdict (only if comment or reject)
N/A.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:1029:architect:approve

review_round: r2
head_sha: d12c2e2
REVIEW_DONE:1029:architect:approve
⟦AI:AUTO-LOOP⟧

@ElonSG

ElonSG commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

🤖 质量评审通过:发布验证 worker 边界拆分可合并

TL;DR

  • What this is: 这是 PR 1029 的 quality 角度独立评审,重点看命名、可读性、简单性、死代码和无关改动。
  • Current state or conclusion: 结论是 approve,新入口足够窄,旧 wakeup-runner 隐藏模式已移除,没有发现质量阻塞点。
  • What the maintainer should do OR what the controller does next: controller 可以继续收集其他角色评审,并按 review gate 规则决定后续合并或修复。

评审证据

我按三点 diff origin/auto-refact-dev...origin/refactor/iter1028-issue-1028 检查了 PR head d12c2e2bd7d1ed6d48391b4d8732dc578fe1002c,并打开了 touched files 的 PR-head 内容。publish-verification-worker 是发布验证 worker 的专用命令入口:父 tick 只把一个 job_dir 交给它,实际验证仍由 publish_verification.py 的既有 job/receipt 逻辑执行。

  • skills/consensus-loop/scripts/codex_refactor_loop/cli.py:24 引入 publish_verification.mainskills/consensus-loop/scripts/codex_refactor_loop/cli.py:90 注册 publish-verification-worker,描述明确为 helper-private,authority 只有 read-state/read-git/write-state/write-log,没有 lifecycle token。
  • skills/consensus-loop/scripts/codex_refactor_loop/publish_verification.py:447 改为启动 publish-verification-workerskills/consensus-loop/scripts/codex_refactor_loop/publish_verification.py:625 的 CLI adapter 只是解析一个 job_dir 并转发到 run_one_publish_ratchet,没有新增抽象层。
  • skills/consensus-loop/scripts/codex_refactor_loop/wakeup_runner.py:2856 只保留正常 runner 参数;旧 --run-one-publish-ratchet 入口和对应 import 已删除,没有留下死兼容路径。
  • skills/consensus-loop/scripts/test_cli_command_router.py:188 锁住 helper-private 命令和无 lifecycle authority,skills/consensus-loop/scripts/test_publish_verification.py:136 锁住新入口转发,skills/consensus-loop/scripts/test_wakeup_runner.py:6608 锁住旧模式被拒绝。
  • skills/consensus-loop/SKILL.md:385skills/consensus-loop/authorizations/runtime-exceptions.md:1119 同步使用同一个命令名,并保留 no public lifecycle CLI / no second verification fact source 的边界;文档改动和 PR 目标一致。

推理纪律说明:参考框架是 separation of concerns 与 controller helper 的 authority minimization。已知好的形态是单用途 worker 命令、一个 router 入口、显式窄 authority,并移除旧 lifecycle runner 的隐藏验证分支。候选 A 是保留 wakeup-runner --run-one-publish-ratchet;Ugly defect: lifecycle runner 内藏特殊执行模式。Beautiful form: 现在的 dedicated helper-private command。候选 B 是新增 generic worker executor;Ugly defect: 会变成 broad command bus,authority 不可验证。Beautiful form: 一个命名命令只接受一个 job directory。候选 C 是本 PR;Ugly defect: router 列表新增了一个命令名,但 mitigation 位于 helper-private 命名、窄 authority、以及 test_cli_command_router.py:188。Beautiful form: 这次实现的小 adapter 加旧 flag 删除。已验证前提:PR head、三点 diff、PR body 中 HOST_REFACTOR_COMMENT_POLICY=none,没有 ASSUMED-UNVERIFIED 前提支撑 verdict。

📎 Full raw codex output (archival)
---
pr: 1029
role: quality
head_sha: d12c2e2bd7d1ed6d48391b4d8732dc578fe1002c
verdict: approve
---

## Verdict
Approve: the diff is narrowly scoped, readable, and does not introduce dead code, needless abstraction, confusing naming, or refactor self-doc violations under policy `none`.

## Evidence
- `skills/consensus-loop/scripts/codex_refactor_loop/cli.py:24` imports the new `publish_verification.main` entrypoint, and `skills/consensus-loop/scripts/codex_refactor_loop/cli.py:90` registers `publish-verification-worker` with explicit helper-private wording and no lifecycle authority tokens.
- `skills/consensus-loop/scripts/codex_refactor_loop/publish_verification.py:447` starts the child through the named `publish-verification-worker` command, while `skills/consensus-loop/scripts/codex_refactor_loop/publish_verification.py:625` keeps the command adapter to a small `argparse` wrapper that forwards exactly one `job_dir`.
- `skills/consensus-loop/scripts/codex_refactor_loop/wakeup_runner.py:2856` now exposes only the normal runner modes; the legacy `--run-one-publish-ratchet` path and import are absent, so the old hidden boundary is not retained as dead compatibility code.
- `skills/consensus-loop/scripts/test_cli_command_router.py:188` verifies the helper-private command authority and absence of lifecycle tokens, `skills/consensus-loop/scripts/test_publish_verification.py:136` verifies the new CLI entrypoint reaches `run_one_publish_ratchet`, and `skills/consensus-loop/scripts/test_wakeup_runner.py:6608` verifies the legacy wakeup-runner ratchet mode is rejected.
- `skills/consensus-loop/SKILL.md:385` and `skills/consensus-loop/authorizations/runtime-exceptions.md:1119` update the owner/source language to the same single command name and retain the no-public-lifecycle/no-second-fact-source boundary; the doc edits are focused on the PR goal.
- Reasoning-discipline note: Reference frame is separation of concerns plus command-query/authority minimization for controller helpers: the known-good shape is a single-purpose worker command with explicit narrow authority, reachable through one router, with the old generic runner escape removed. Candidate A, keep `wakeup-runner --run-one-publish-ratchet`: Ugly defect: hidden special-case flag inside a lifecycle runner. Beautiful form: the dedicated helper-private command used here. Candidate B, add a generic worker executor: Ugly defect: broad command bus and unverifiable authority. Beautiful form: one named command with one positional job directory. Candidate C, this PR's narrow CLI adapter: Ugly defect: it adds a public command name to the router list, but the locatable mitigation is helper-private naming plus read/write-state/log-only authority and tests at `test_cli_command_router.py:188`. Beautiful form: exactly the implemented small adapter with old flag removed. Verified premises: PR head is `d12c2e2bd7d1ed6d48391b4d8732dc578fe1002c`; PR body states `HOST_REFACTOR_COMMENT_POLICY=none`; the three-dot diff touches only the publish verification boundary, docs, and tests. No `ASSUMED-UNVERIFIED` premise is needed for this verdict.

## What would change your verdict (only if comment or reject)
Not applicable.

⟦AI:AUTO-LOOP⟧
REVIEW_DONE:1029:quality:approve

review_round: 2
head_sha: d12c2e2
REVIEW_DONE:1029:quality:approve
⟦AI:AUTO-LOOP⟧

@ElonSG ElonSG marked this pull request as ready for review June 20, 2026 11:15
@ElonSG ElonSG merged commit 7813d99 into auto-refact-dev Jun 20, 2026
4 checks passed
@ElonSG ElonSG deleted the refactor/iter1028-issue-1028 branch June 20, 2026 11:15
@ElonSG ElonSG added crnd:phase:merged merged and removed crnd:phase:reviewing review-gate reviewers in flight labels Jun 20, 2026
ElonSG added a commit that referenced this pull request Jun 20, 2026
* 实现 issue #1028

* 实现 issue #1028

---------

Co-authored-by: Loning <aloning@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

crnd:human:auto auto-advancing, no human needed crnd:lifecycle:managed loop-managed item crnd:phase:merged merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants