Skip to content

feat(0.9.1): Wire 3 — embodiment-state → tool filter#255

Merged
dennys246 merged 2 commits into
mainfrom
feat/0-9-1-wire-3-embodiment-tool-filter
May 17, 2026
Merged

feat(0.9.1): Wire 3 — embodiment-state → tool filter#255
dennys246 merged 2 commits into
mainfrom
feat/0-9-1-wire-3-embodiment-tool-filter

Conversation

@dennys246
Copy link
Copy Markdown
Owner

Summary

Stage 1 of release_0_9_1.md (lifted from bio_emergent_persona_foundations.md § Wire 3). The cleanest "emergent trait" demonstration in the foundations plan: an agent with a damaged body component visibly stops attempting affordances routed through it. Physical history shapes behavior without prompt-injection scaffolding.

What ships

Component Where LOC
Embodiment.get_disabled_affordances (filtered from prompt) src/maxim/embodiment/body.py +51
Embodiment.get_degraded_affordances (annotated in prompt) same +24
Embodiment.integrity_to_felt_phrase (felt-sensation mapper) same +20
Agent_loop filter + annotate hook (with WIRE_3_FILTER JSONL emission) src/maxim/runtime/agent_loop.py +118
Tests (41 tests, 9 layers) tests/unit/test_wire_3_embodiment_filter.py +725 (new file)

Total: ~440 LOC source + 725 LOC test = 1,165. 41 unit tests + 163 module regression passing. Full fast suite (pre-fold): 6682 passed.

Band semantics

Integrity State LLM-visible
integrity < 0.3 disabled — filtered from available_tools tool absent from prompt
0.3 ≤ integrity < 0.45 degraded, low (feels weakened, prone to failing)
0.45 ≤ integrity < 0.6 degraded, high (feels strained)
integrity ≥ 0.6 healthy unchanged

The bands partition [0, 1] cleanly — no overlap, no gap. Pinned by TestBandPartitioning (8 parametric tests).

Two-lens pre-merge review

Per feedback_review_before_ship.md. Both reviews ran in parallel before opening this PR; 0 Critical from either lens. Folded 8 Important findings into commit 8352106:

Bio-fidelity findings (3 Important):

Finding Fix
B2 (highest signal) — pre-filter silently bypasses the natural failure → pain → NAc learning chain for disabled tools; Roy-3 can't disambiguate "Wire 3 hid the tool" from "substrate learned avoidance" Emit sim_log("WIRE_3_FILTER", ...) per LLM submission with disabled tools list + degraded integrities. Tick-aligned with Stages 0c/0d.
B3 — annotation reads as system-voice metric badge ([DAMAGED: integrity 0.X]) rather than substrate's proprioceptive percept Felt-sensation phrasing via integrity_to_felt_phrase(). Numeric integrity stays in JSONL event for Roy-3; LLM sees (feels strained) / (feels weakened, prone to failing) only.
B5compute_integrity raise silently swallowed → "unknown state" conflated with "healthy state" WARNING log with entity + modulator name, loop stability preserved (still fail-open to 1.0).

Architecture findings (5 Important):

Finding Fix
A1 — idempotency guard fails when integrity drifts across band boundaries (NAc + modulator repair can recover integrity multi-tick) Regex strip _WIRE3_PHRASE_RE before re-applying current-tick phrase. Pinned by test_annotation_idempotent_under_integrity_drift.
A4 — broad except Exception at DEBUG level swallows method-shape mismatches Narrowed to (AttributeError, TypeError) at WARNING level so Roy-3 validation surfaces shape bugs.
A5 — no test pins that annotation actually reaches the LLM-visible prompt string TestLlmRenderingRoundTrip (2 tests) constructs LLMRequest with post-Wire-3 tool_descriptions, asserts build_tools_section_filtered produces a prompt containing the felt phrase.
A3 — plan-language ambiguity (integrity < 0.6 annotates) Docstring + inline-comment pin on Embodiment class.
B1 — I/O-boundary audit trail Top-of-section comment explicitly states Wire 3 thresholds gate the LLM proposer's tool surface, NOT substrate encoding.

Deferred (3 nice-to-haves): structured annotation surface for budgeter awareness, registry-walk tool-name lookup, capability-composition dedup. None blocking — Roy/cradle/Reachy single-body topologies don't hit any of them.

Frozen contract impact

None. No new persisted state, no _format_version bumps, no dataclass changes. Pure read-side wiring per the plan.

Behavioral signal + Roy-3 measurement

The behavioral signal is what the plan calls out: a damaged-arm agent stops calling arm-routed affordances. Roy-3 measurement is now disambiguable thanks to the bio-fidelity fold:

  • Pre-Wire-3 baseline: damaged-arm agent keeps calling arm-routed affordances, fails via the SEM requires precondition, learns avoidance via NAc reward bias over many failures.
  • Post-Wire-3: damaged-arm agent never calls disabled arm-routed affordances; degraded-arm agent reads felt-sensation phrasing and adjusts. WIRE_3_FILTER JSONL event lets Roy-3 count exactly which tools were filtered each tick.

Test plan

  • python -m pytest tests/unit/test_wire_3_embodiment_filter.py -v41 passed (9 layers: get_disabled / get_degraded / band partitioning / tool-name pattern / hook shape / degenerate cases / felt phrases / WIRE_3_FILTER emission / LLM round-trip).
  • python -m pytest tests/unit/test_wire_3_embodiment_filter.py tests/unit/test_embodiment_failures.py tests/unit/test_embodiment_sem.py tests/unit/test_prompt_builder.py -q163 passed (no regression).
  • python -m pytest tests/ -x -q -m "not slow" --ignore=tests/integration/test_memory_hub.py6682 passed (full fast suite, pre-fold; fold added pure test additions).
  • ruff check + ruff format clean on touched files.
  • Roy-2c regression guard — Wire 3 is downstream of substrate encoding (per the bio-fidelity B1 audit-trail docstring), so the Roy-2c finding fix in Wire-A is preserved.
  • Next: Wire 2 (Stage 3), Wire 1 (Stage 4), then Roy-3 validation (Stage 5).

What's next in 0.9.1

Per release_0_9_1.md:

  1. ✅ Stage 0a (Roy-2c probe)
  2. ✅ Stages 0b + 0c (telemetry)
  3. Stage 1 (Wire 3) — this PR
  4. ✅ Stage 2 (Wire-A)
  5. ⏳ Stage 3 (Wire 2: Pavlovian percept aversion)
  6. ⏳ Stage 4 (Wire 1: risk-sensitive action annotation)
  7. ⏳ Stage 5 (Roy-3 validation)

🤖 Generated with Claude Code

dennys246 and others added 2 commits May 16, 2026 21:22
Stage 1 of release_0_9_1.md, lifted from
bio_emergent_persona_foundations.md § Wire 3. The smallest of the
four wires by LOC, highest behavioral signal per unit work: an
agent with a damaged arm visibly stops attempting arm-routed
affordances without any prompt-injection scaffolding. The cleanest
emergent "trait" demonstration in the foundations plan.

Implementation:

- Embodiment.get_disabled_affordances(*, threshold=None) → set[str]
  Walks the entity tree, computes each modulator's
  compute_integrity(), and returns base tool names
  ({entity.name}_{affordance_name}) for modulators strictly below
  the disable threshold (default 0.3). Modulators that don't
  expose compute_integrity (capability-only, decorator-style)
  default to integrity=1.0 per the backward-compat convention
  SpecModulator.compute_integrity already uses on empty
  vital_metrics. A buggy modulator whose compute_integrity raises
  is treated as integrity=1.0 (fail-open).

- Embodiment.get_degraded_affordances(*, disable_threshold=None,
  degrade_threshold=None) → dict[str, float]
  Same walk, returns {base_tool_name: integrity} for modulators
  in [disable_threshold, degrade_threshold) (default [0.3, 0.6)).
  The bands partition [0, 1] cleanly — every affordance lands in
  exactly one of {disabled, degraded, healthy}, never both.

- agent_loop.py hook between mode_info.get_available_tools(...) and
  the tool_descriptions build loop:
    1. Filter disabled affordances out of available_tools.
    2. After per-tool description build, append "[DAMAGED: integrity
       0.X]" to each degraded affordance's description.
  Fail-open: no embodiment, missing compute_integrity, raising
  modulator → the hook is a no-op. Description annotation is
  idempotent (the "if annotation not in base_desc" guard prevents
  the suffix accumulating across ticks). Copy-on-write —
  TOOL_DESCRIPTIONS is a shared module-level dict; mutation would
  poison future calls and other agents.

Tool-name pattern (the load-bearing assumption):
- tool_bridge.generate_tools_for_entity registers
  ModulatorAffordanceTools as {entity.name}_{affordance_name}
  unless _resolve_tool_name collision-prefixes an ancestor name.
- Roy / cradle / Reachy use single-body topologies — no collisions
  in practice. Wire 3's base names match the registered tool names
  cleanly. Under a hypothetical collision, the filter fails open
  (the tool stays available; no silent mis-gating).
- TestToolNamePattern.test_base_name_matches_generated_tool_name
  pins this contract: it constructs a real ToolRegistry, calls
  generate_tools_for_entity, and asserts every predicted-disabled
  name is in the registry.

Test surface (30 tests, 6 layers):

- Layer 1 (get_disabled_affordances, 7 tests): healthy/critical/
  boundary/per-modulator-isolation/threshold-override.
- Layer 2 (get_degraded_affordances, 7 tests): boundary semantics
  for both ends of the [0.3, 0.6) band; threshold overrides.
- Layer 3 (band partitioning, 8 parametric tests): no integrity
  value lands in both disabled and degraded sets.
- Layer 4 (tool-name pattern, 1 test): live ToolRegistry
  round-trip via generate_tools_for_entity.
- Layer 5 (agent_loop hook shape, 4 tests): filter/annotate/
  idempotency/no-embodiment-no-op.
- Layer 6 (degenerate cases, 3 tests): empty modulators, empty
  affordances, raising compute_integrity.

All 30 tests pass. Ruff clean.

Frozen contract impact: none. No new persisted state, no dataclass
changes. Pure read-side wiring per the plan.

Behavioral signal: a damaged-arm agent stops calling arm-routed
affordances. Roy-3 validation can measure this once Wires 1-A
all ship.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two-lens pre-merge review of feat/0-9-1-wire-3-embodiment-tool-filter
surfaced 0 Critical and 8 Important findings (3 bio, 5 arch). All
folded before opening the PR per feedback_review_before_ship.md.

41 tests passing (up from 30 — 11 new fold-regression guards across
3 new test layers).

Bio-fidelity findings folded:

- **B2 (highest signal): emit `sim_log("WIRE_3_FILTER", ...)` per
  tick.** The pre-filter silently bypasses the natural substrate-
  learning chain (failure → pain → NAc) for disabled tools — without
  observability, Roy-3 can't disambiguate "Wire 3 hid the tool" from
  "substrate learned avoidance." The emission lists disabled tools
  + degraded integrities per LLM submission, gated on disabled OR
  degraded being non-empty. Tick aligned with Stages 0c/0d
  (int(time.time() - sim_logger._sim_start)). Fail-soft on ImportError
  for non-sim runtime paths. Pinned by
  TestWire3FilterEmission.test_emission_lists_disabled_and_degraded.

- **B3: felt-sensation phrasing instead of metric badge.** The
  pre-fold annotation read `[DAMAGED: integrity 0.X]` — a system-
  voice metric badge. Post-fold uses
  `Embodiment.integrity_to_felt_phrase()` to map degraded-band
  integrity to proprioceptive percept ("feels strained" /
  "feels weakened, prone to failing"). The numeric integrity stays
  in the WIRE_3_FILTER JSONL event for Roy-3 analysis; the LLM sees
  the qualitative phrase only. Two bands within the degraded range:
  [0.45, 0.6) → "feels strained"; [0.3, 0.45) → "feels weakened,
  prone to failing". Pinned by TestIntegrityToFeltPhrase (6 tests).

- **B5: WARNING log on compute_integrity raise.** Pre-fold the inner
  try/except in `_iter_modulator_affordance_pairs` silently
  swallowed every exception and treated the modulator as healthy
  (1.0). Per the no-band-aid rule (CLAUDE.md), this conflates
  "unknown state" with "healthy state" — a body whose self-
  monitoring is broken is itself in trouble. Post-fold logs a
  WARNING with the entity + modulator name so the broken modulator
  surfaces during Roy-3 / operator review. Loop stability is
  preserved (still fail-open to 1.0). Pinned by the updated
  test_compute_integrity_raises_treated_as_healthy with a caplog
  WARNING assertion.

- **B1: I/O-boundary docstring pin.** Added a top-of-section comment
  on the Wire 3 threshold constants explicitly stating the
  thresholds gate the LLM-proposer's tool surface, NOT substrate
  encoding (EC/ATL/NAc are upstream). Mirrors the Wire-A bias-band
  bio-defensible-bands audit trail.

Architecture findings folded:

- **A1: regex-strip felt annotation before re-append.** Pre-fold
  idempotency guard was `if annotation not in base_desc` — but if
  integrity drifts across ticks (NAc reward learning + modulator
  repair can recover integrity over multi-tick windows: 0.55 →
  0.40 → 0.55), the felt phrase shifts band and both suffixes
  accumulate. Post-fold uses `_WIRE3_PHRASE_RE.sub("", base_desc)`
  before re-applying the current-tick phrase — exactly one suffix
  on the description regardless of drift history. Pinned by
  test_annotation_idempotent_under_integrity_drift.

- **A4: narrow except Exception to (AttributeError, TypeError) +
  WARNING log.** Pre-fold the hook caught broad Exception at DEBUG
  level. The body.py inner guard already swallows compute_integrity
  raises; the outer surface failures here are method-shape
  mismatches (non-Embodiment object plugged into executor.embodiment).
  Post-fold narrows to (AttributeError, TypeError) at WARNING level
  so the failure surfaces during Roy-3 validation runs.

- **A3: docstring pins for band semantics.** Added inline comment
  block on the Wire 3 threshold constants documenting "integrity
  < 0.3 disables (strict); 0.3 <= integrity < 0.6 degrades
  (inclusive)" — closes the architecture-lens nit about the plan's
  ambiguous wording ("integrity < 0.6 annotates" naturally reads as
  ALSO including the disabled range, but only the [0.3, 0.6) band
  reaches the annotation path).

- **A5: LLM-rendering round-trip test.** New
  TestLlmRenderingRoundTrip class with 2 tests. First test
  constructs an LLMRequest with the post-Wire-3 tool_descriptions
  dict and calls build_tools_section_filtered — asserts the felt-
  sensation phrase reaches the LLM-visible prompt string. Second
  test pins that disabled tools (filtered out of available_tools by
  Wire 3) DO NOT appear in the rendered tool section. Without these
  tests, a future refactor of the prompt-section renderer could
  silently drop Wire 3's signal.

Deferred (architecture nice-to-haves N1/N2/N3):

- Structured annotation surface for budgeter awareness (separate
  `damage_annotation` field) — single agent / small prompts in
  0.9.1 don't hit budget pressure.
- Registry-walk tool-name lookup (instead of `{entity.name}_
  {affordance_name}` reconstruction) — single-body topologies don't
  trigger collisions; structural fix deferred to first multi-body
  sim that hits one.
- `_iter_modulator_affordance_pairs` dict-collision dedup —
  capability composition isn't a 0.9.1 topology.

Total +1 file changed (agent_loop.py +66/-12), +1 file changed
(body.py +57/-1), +1 file changed (test file +269/-30). 41 tests
passing. Ruff clean on touched files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dennys246 dennys246 merged commit 51dfd38 into main May 17, 2026
5 checks passed
@dennys246 dennys246 deleted the feat/0-9-1-wire-3-embodiment-tool-filter branch May 17, 2026 04:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant