Skip to content

feat(daimonion): impingement-native voice Phase 1#3727

Open
ryanklee wants to merge 8 commits into
mainfrom
alpha/screwm-xvfb-safe-smoke
Open

feat(daimonion): impingement-native voice Phase 1#3727
ryanklee wants to merge 8 commits into
mainfrom
alpha/screwm-xvfb-safe-smoke

Conversation

@ryanklee
Copy link
Copy Markdown
Collaborator

@ryanklee ryanklee commented May 24, 2026

Summary

  • AudioPerceptionBackend emits operator speech as dominant impingements (strength ~1.0)
  • Operator speech impingements recruit conversation via process_impingement
  • ResourceArbiter wired into audio output for priority-based preemption
  • Exploration surfacing defers to higher-priority claims

Spec: docs/superpowers/specs/2026-05-23-impingement-native-voice-design.md

Test plan

  • AudioPerceptionBackend tests pass
  • CPAL runner tests pass
  • Operator speech preempts exploration
  • End-to-end: speak during exploration → exploration interrupted

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Live audio perception backend: STT, speaker/operator detection, queued prioritized audio events; standalone remote TTS network service.
  • Behavior Changes

    • Operator speech preempts playback and is routed into conversation processing; exploration runs in background. VAD/buffering now detects operator speech during TTS. Sessions no longer auto-close on silence.
  • Configuration

    • Updated local model/service defaults and GPU visibility; PyTorch/CUDA dependency constraints added.
  • Tests

    • Added/updated tests for audio perception and runner session behavior.
  • Documentation

    • Added design and rollout plans for impingement-native voice.

Review Change Stack

…ngements

Introduces AudioPerceptionBackend — first step in dissolving the
ConversationPipeline sidecar. Operator speech now enters the impingement
field (strength = vad_confidence × speaker_posterior) and competes on
force with exploration and narration. Non-operator speech emits weaker
audio.scene impingements. Writes to /dev/shm/hapax-dmn/impingements.jsonl.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 24, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds AudioPerceptionBackend that STT-transcribes audio, resolves speaker/presence, emits calibrated impingements to an in-memory queue and JSONL bus, integrates with CpalRunner for preemption and transcript-forwarding, swaps resident STT to Parakeet ONNX, adjusts VAD/conversation flow, and adds TTS remote support plus config/docs updates.

Changes

Audio Perception Backend

Layer / File(s) Summary
Module, constants, class and lifecycle
agents/hapax_daimonion/audio_perception.py
Module imports/constants; AudioPerceptionBackend.init accepts stt, speaker_id, presence_provider; exposes name/provides/tier and lifecycle methods; in-memory bounded deque and drain_impingements.
Compute and persist impingements
agents/hapax_daimonion/audio_perception.py
_emit_speech_impingement classifies operator vs scene, computes strength (vad * speaker posterior, non-operator scaled by 0.3), builds impingement dict, enqueues, and appends JSONL via _write_to_bus (OSError caught/logged).
Async transcription and speaker resolution
agents/hapax_daimonion/audio_perception.py
process_utterance awaits injected STT, skips empty transcripts, tries speaker-id, falls back to presence_provider posterior when unknown, logs failures and emits impingements.
Backend behavior validation
tests/hapax_daimonion/test_audio_perception.py
Pytest suite with mocked STT/speaker-id verifying provides/name, operator vs scene impingement source/type and strength thresholds, numeric strength calculation, and drain_impingements clearing queue.

Runner integration and playback control

Layer / File(s) Summary
Constructor wiring
agents/hapax_daimonion/cpal/runner.py
CpalRunner gains audio_perception argument and stores it on the runner.
Tick loop: utterance routing & timeout removal
agents/hapax_daimonion/cpal/runner.py, tests/hapax_daimonion/test_cpal_runner.py
_tick no longer auto-closes sessions on timeout and routes captured utterances to audio_perception.process_utterance(...) when present; added test asserting close_session not called when timeout disabled.
Operator impingement preemption
agents/hapax_daimonion/cpal/runner.py
process_impingement claims arbiter audio_output, calls _kill_inflight_playback(), and forwards transcripts to pipeline.process_utterance_from_transcript() (warnings if unsupported); _kill_inflight_playback stops audio_output.kill() and clears speaking flag.
Exploration arbiter and background speech
agents/hapax_daimonion/cpal/runner.py
Exploration surfacing drains arbiter winners, defers if higher-priority winner exists, otherwise claims audio_output and launches spontaneous speech in background via _exploration_speech_task under _speech_lock.

Resident STT (ONNX Parakeet)

Layer / File(s) Summary
Docs, imports, default model
agents/hapax_daimonion/resident_stt.py
Module/class docs updated for Parakeet TDT, imports adjusted (wave/tempfile), default model constant and ResidentSTT.init default updated.
Model load and sync transcription
agents/hapax_daimonion/resident_stt.py
load() dynamically imports onnx_asr and calls onnx_asr.load_model; transcribe() offloads _transcribe_sync to executor; _transcribe_sync writes PCM to temp WAV, calls model.recognize(), strips result and logs duration; prosody extraction no longer invoked inline.

Conversation pipeline & buffer

Layer / File(s) Summary
LLM call and defaults
agents/hapax_daimonion/conversation_pipeline.py
ConversationPipeline llm_model default set to local-fast; generate_spontaneous_speech now uses system_prompt, wraps LiteLLM call with asyncio.wait_for, and supplies model as openai/{model} with api_key/api_base.
Transcript-driven entrypoint
agents/hapax_daimonion/conversation_pipeline.py
Adds process_utterance_from_transcript(transcript) to process already-transcribed operator utterances via local Tabby endpoint and play responses.
VAD gating changes
agents/hapax_daimonion/conversation_buffer.py
feed_audio() and update_vad() remove software speaking/cooldown gates so VAD always runs; relies on AEC/speaker-id for echo suppression.

Daemon wiring, loops, configs, deps, and ops

Layer / File(s) Summary
Run inner: start backend & impingement ordering
agents/hapax_daimonion/run_inner.py
run_inner instantiates and starts AudioPerceptionBackend (presence posterior helper) and passes it into CpalRunner; impingement consumer loop now prioritizes operator speech and may skip certain non-operator sources when operator speech present.
Offload blocking work to threads
agents/hapax_daimonion/run_loops_aux.py
proactive_delivery_loop uses run_in_executor for TTS synthesis; impingement_consumer_loop dispatches narration helpers via asyncio.to_thread to avoid blocking.
Model routing, config, deps, systemd
agents/hapax_daimonion/model_router.py, agents/hapax_daimonion/config.py, shared/config.py, pyproject.toml, systemd/units/.../gpu-pin.conf
TIER_ROUTES FAST/STRONG/CAPABLE → local-fast; DaimonionConfig defaults updated; Ollama host and embedding model defaults changed and client now receives host; pyproject pins torch>=2.12.0, bumps torchvision, adds torchaudio and a pytorch-cu130 index; systemd GPU pin now exposes CUDA_VISIBLE_DEVICES=0,1; audio input queue default increased.

Sequence Diagram(s)

sequenceDiagram
  participant Mic as Microphone
  participant AP as AudioPerceptionBackend
  participant STT as ResidentSTT (onnx_asr)
  participant SID as SpeakerID
  participant PRES as PresenceProvider
  participant BUS as Impingement JSONL Bus
  participant CPAL as CpalRunner
  participant PIPE as ConversationPipeline

  Mic->>AP: PCM audio frame (utterance)
  AP->>STT: transcribe(audio) -> transcript
  alt speaker-id available
    AP->>SID: identify(transcript/audio) -> speaker, posterior
  else fallback
    AP->>PRES: presence_posterior() -> posterior
  end
  AP->>BUS: write impingement JSONL (source,type,content,strength)
  AP->>CPAL: enqueue impingement (in-memory)
  CPAL->>AP: drain/process impingements
  CPAL->>PIPE: process_utterance_from_transcript(transcript) [on operator impingement]
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I heard a whisper turned to text,
A pipeline hummed and neatly vexed,
Impingements queued, then softly played,
Operator’s voice reclaimed the day,
The rabbit logs, persists, and rests.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description is incomplete; it lacks required AuthorityCase information (Case ID, Slice ID) and CLAUDE.md hygiene checklist as specified in the repository's description template. Add Case/Slice identifiers and CLAUDE.md hygiene checklist section with appropriate checkboxes to match the required template structure.
Docstring Coverage ⚠️ Warning Docstring coverage is 27.12% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: implementing an impingement-native voice architecture (Phase 1) for the Daimonion agent.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch alpha/screwm-xvfb-safe-smoke

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

Auto-fix classified CI failure categories as lint. Privileged workflow auto-mutation is disabled; route this through governed remediation.

… preemption

- AudioPerceptionBackend wired into CPAL runner — utterances route through
  impingement bus instead of direct _process_utterance()
- process_impingement handles audio.operator_speech before adapter — operator
  speech is NOT subject to surfacing thresholds
- ResourceArbiter claims audio_output at conversation priority (100), preempts
  exploration (15) with _kill_inflight_playback
- Exploration surfacing checks arbiter before claiming audio_output
- process_utterance_from_transcript added to ConversationPipeline for
  impingement-routed transcripts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
agents/hapax_daimonion/audio_perception.py (1)

94-97: ⚡ Quick win

Create the bus parent directory before append to prevent silent persistence loss.

If /dev/shm/hapax-dmn is missing, writes fail and impingements are only kept in-memory. Creating parent dirs before open makes persistence more robust.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agents/hapax_daimonion/audio_perception.py` around lines 94 - 97, Ensure the
bus parent directory is created before attempting to append to IMPINGEMENT_BUS
to avoid silent persistence loss: before calling IMPINGEMENT_BUS.open("a") in
the impingement write block (the code that currently uses
IMPINGEMENT_BUS.open("a") and catches OSError), call the directory-creation step
(e.g. IMPINGEMENT_BUS.parent.mkdir(parents=True, exist_ok=True)) and handle/log
any exceptions appropriately so that writes no longer silently fail when
/dev/shm/hapax-dmn is missing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@agents/hapax_daimonion/audio_perception.py`:
- Around line 35-36: The tier method currently returns the raw string "FAST";
change it to return the PerceptionTier enum value and update the signature:
import PerceptionTier, change def tier(self) -> str: to def tier(self) ->
PerceptionTier:, and return PerceptionTier.FAST in the tier() method so callers
receive the enum rather than a plain string (ensure the PerceptionTier import is
added at the top of the module).
- Around line 68-69: The emitted impingement record uses enum names
("PATTERN_MATCH"/"STATISTICAL_DEVIATION") but the consumer expects canonical
lowercase values; update the dict construction that sets the "type" field (the
expression using is_operator) to emit "pattern_match" when is_operator is true
and "statistical_deviation" otherwise (or use a small mapping from the enum/flag
to these canonical strings) so ImpingementConsumer/Impingement will accept the
records; keep the rest (e.g., strength rounding) unchanged.
- Around line 84-90: The log statement in audio_perception.py currently includes
raw transcript text at INFO level; change the log.info call that references
log.info(..., imp["source"], imp["strength"], speaker, transcript) so it only
logs high-level metadata (imp["source"], imp["strength"], speaker) at INFO and
either redact the transcript or move the full transcript text to a log.debug
call that references the same transcript variable for diagnostics; ensure you do
not print transcript at INFO and keep the log message context clear while using
the existing variables (imp, speaker, transcript).

In `@tests/hapax_daimonion/test_audio_perception.py`:
- Around line 3-4: Remove the unused imports causing Ruff F401 by deleting
"time" and "patch" from the import line in the test module; keep the necessary
AsyncMock and MagicMock imports and adjust any code that was intended to use
time/patch to use alternatives or mocks already present (look for the import
statement that currently reads "import time\nfrom unittest.mock import
AsyncMock, MagicMock, patch" or a single-line variant and remove the unused
names "time" and "patch").

---

Nitpick comments:
In `@agents/hapax_daimonion/audio_perception.py`:
- Around line 94-97: Ensure the bus parent directory is created before
attempting to append to IMPINGEMENT_BUS to avoid silent persistence loss: before
calling IMPINGEMENT_BUS.open("a") in the impingement write block (the code that
currently uses IMPINGEMENT_BUS.open("a") and catches OSError), call the
directory-creation step (e.g. IMPINGEMENT_BUS.parent.mkdir(parents=True,
exist_ok=True)) and handle/log any exceptions appropriately so that writes no
longer silently fail when /dev/shm/hapax-dmn is missing.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: e3221f3c-3e6c-47f3-9c80-95bf66f5739a

📥 Commits

Reviewing files that changed from the base of the PR and between 0170630 and 5141e11.

📒 Files selected for processing (2)
  • agents/hapax_daimonion/audio_perception.py
  • tests/hapax_daimonion/test_audio_perception.py

Comment on lines +35 to +36
def tier(self) -> str:
return "FAST"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use PerceptionTier enum for tier instead of a raw string.

Line 36 returns "FAST" as str, but existing perception backends expose a PerceptionTier value. This can break tier-based orchestration/comparisons.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agents/hapax_daimonion/audio_perception.py` around lines 35 - 36, The tier
method currently returns the raw string "FAST"; change it to return the
PerceptionTier enum value and update the signature: import PerceptionTier,
change def tier(self) -> str: to def tier(self) -> PerceptionTier:, and return
PerceptionTier.FAST in the tier() method so callers receive the enum rather than
a plain string (ensure the PerceptionTier import is added at the top of the
module).

Comment thread agents/hapax_daimonion/audio_perception.py Outdated
Comment on lines +84 to +90
log.info(
"Audio impingement: source=%s strength=%.2f speaker=%s transcript=%.40s",
imp["source"],
imp["strength"],
speaker,
transcript,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid logging transcript content at INFO level.

Line 85 logs raw transcript text, which can include sensitive user speech/PII. Keep high-level metadata at INFO and move/redact text for DEBUG-only diagnostics.

Suggested fix
-        log.info(
-            "Audio impingement: source=%s strength=%.2f speaker=%s transcript=%.40s",
-            imp["source"],
-            imp["strength"],
-            speaker,
-            transcript,
-        )
+        log.info(
+            "Audio impingement: source=%s strength=%.2f speaker=%s",
+            imp["source"],
+            imp["strength"],
+            speaker,
+        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
log.info(
"Audio impingement: source=%s strength=%.2f speaker=%s transcript=%.40s",
imp["source"],
imp["strength"],
speaker,
transcript,
)
log.info(
"Audio impingement: source=%s strength=%.2f speaker=%s",
imp["source"],
imp["strength"],
speaker,
)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agents/hapax_daimonion/audio_perception.py` around lines 84 - 90, The log
statement in audio_perception.py currently includes raw transcript text at INFO
level; change the log.info call that references log.info(..., imp["source"],
imp["strength"], speaker, transcript) so it only logs high-level metadata
(imp["source"], imp["strength"], speaker) at INFO and either redact the
transcript or move the full transcript text to a log.debug call that references
the same transcript variable for diagnostics; ensure you do not print transcript
at INFO and keep the log message context clear while using the existing
variables (imp, speaker, transcript).

Comment on lines +3 to +4
import time
from unittest.mock import AsyncMock, MagicMock, patch
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove unused imports to fix Ruff F401.

Line 3 (time) and Line 4 (patch) are unused and currently fail lint.

🧰 Tools
🪛 GitHub Check: lint

[failure] 4-4: ruff (F401)
tests/hapax_daimonion/test_audio_perception.py:4:49: F401 unittest.mock.patch imported but unused
help: Remove unused import: unittest.mock.patch


[failure] 3-3: ruff (F401)
tests/hapax_daimonion/test_audio_perception.py:3:8: F401 time imported but unused
help: Remove unused import: time

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/hapax_daimonion/test_audio_perception.py` around lines 3 - 4, Remove
the unused imports causing Ruff F401 by deleting "time" and "patch" from the
import line in the test module; keep the necessary AsyncMock and MagicMock
imports and adjust any code that was intended to use time/patch to use
alternatives or mocks already present (look for the import statement that
currently reads "import time\nfrom unittest.mock import AsyncMock, MagicMock,
patch" or a single-line variant and remove the unused names "time" and "patch").

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5141e11e71

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

"id": str(uuid.uuid4()),
"timestamp": time.time(),
"source": "audio.operator_speech" if is_operator else "audio.scene",
"type": "PATTERN_MATCH" if is_operator else "STATISTICAL_DEVIATION",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Emit valid impingement type enum strings

Use the canonical ImpingementType wire values here (e.g. pattern_match, statistical_deviation) instead of uppercase names. The DMN consumers parse JSONL via Impingement.model_validate_json, so these uppercase values fail enum validation and the line is treated as malformed/skipped; in practice that drops every speech impingement before it reaches process_impingement/affordance routing.

Useful? React with 👍 / 👎.

speaker_confidence = 0.0
if self._speaker_id is not None:
try:
speaker, speaker_confidence = self._speaker_id.identify(audio_bytes)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Normalize speaker-ID API before operator classification

This call assumes identify accepts raw PCM bytes and returns a (speaker, confidence) tuple, but the repo’s shipped SpeakerIdentifier (agents/hapax_daimonion/speaker_id.py) expects an embedding and returns SpeakerResult. If this backend is wired to the daemon’s _speaker_identifier from agents/hapax_daimonion/init_audio.py, the call raises and the exception path forces speaker='unknown', so operator speech is never recognized and always emitted on the low-strength ambient path.

Useful? React with 👍 / 👎.

OPERATOR_SPEAKER_THRESHOLD = 0.60


class AudioPerceptionBackend:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Register the backend so audio impingements can run

This new backend is never instantiated or registered in the daemon backend wiring, so none of its process_utterance/bus-emission logic runs in production. I checked the runtime registration list in agents/hapax_daimonion/init_backends.py and there is no AudioPerceptionBackend entry, which makes this feature path effectively dead code despite the new implementation and tests.

Useful? React with 👍 / 👎.

…nt impingement

Phase 1 of dissolving the ConversationPipeline sidecar. Operator speech
now enters the impingement field via AudioPerceptionBackend instead of
bypassing the cascade through a hardwired conversation pipeline.

Architecture:
- AudioPerceptionBackend emits audio.operator_speech impingements
  (strength ~0.76, PATTERN_MATCH type) from VAD + STT + presence
- CPAL runner handles operator speech before the impingement adapter,
  bypassing surfacing thresholds
- ResourceArbiter wired into audio output: conversation(100) preempts
  exploration(15) with in-flight playback kill
- Exploration surfacing runs as background task, no longer blocks the
  impingement consumer loop
- ConversationBuffer speaking/cooldown gates removed — XVF3800 hardware
  AEC handles echo rejection at device level

Infrastructure fixes:
- PyTorch 2.12.0+cu130 for Blackwell sm_120 (Chatterbox on CUDA)
- GPU UUID updated in fish config post-migration
- Ollama routed to appendix (192.168.68.50)
- Parakeet TDT STT replacing Whisper
- Session timeout disabled (pipeline never stops)
- Utterance drain replaced with keep-latest queue
- Audio queue increased to 1000 frames (30s) for GIL contention
- Blocking TTS in proactive_delivery_loop wrapped in executor
- ImpingementType enum case fix (PATTERN_MATCH → pattern_match)
- Presence provider score() property call fix
- ReSpeaker XVF3800 duplex profile + hardware AEC reference feed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Auto-fix classified CI failure categories as lint. Privileged workflow auto-mutation is disabled; route this through governed remediation.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
agents/hapax_daimonion/conversation_pipeline.py (1)

366-379: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Send the refreshed system context here, not the static base prompt.

_update_system_context() updates self.messages[0]["content"], but this call now sends self.system_prompt. That drops the live conversation thread, environment/policy bands, salience, and grounding directives from spontaneous-speech generation.

Suggested fix
-            messages = [
+            system_content = (
+                self.messages[0]["content"] if self.messages else self.system_prompt
+            )
+            messages = [
                 {
                     "role": "system",
-                    "content": f"{spontaneous_envelope}\n\n{self.system_prompt}",
+                    "content": f"{spontaneous_envelope}\n\n{system_content}",
                 },
                 {"role": "user", "content": prompt},
             ]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agents/hapax_daimonion/conversation_pipeline.py` around lines 366 - 379,
_update_system_context() mutates the live system content in
self.messages[0]["content"], but the spontaneous-speech phase is currently
sending the static self.system_prompt instead, losing updated context; change
the messages construction to use the refreshed system entry (e.g., combine
spontaneous_envelope with self.messages[0]["content"] or replace the "content"
field with f"{spontaneous_envelope}\n\n{self.messages[0]['content']}") so the
spontaneous_envelope and subsequent messages use the live
conversation/policy/salience grounding produced by _update_system_context(),
keeping render_envelope, spontaneous_envelope, and the messages list semantics
intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@agents/hapax_daimonion/config.py`:
- Line 107: Replace the hardcoded llm_model default with the centralized alias
from shared.config: import the shared alias constant (for example
DEFAULT_LLM_ALIAS) from shared.config and set llm_model: str = DEFAULT_LLM_ALIAS
in agents/hapax_daimonion/config.py (keep the llm_model symbol and its type hint
unchanged); update any module imports accordingly so the default is sourced from
shared/config.py rather than the local literal.

In `@agents/hapax_daimonion/conversation_pipeline.py`:
- Around line 463-484: The transcript path currently mutates
_conversation_thread directly and calls litellm.acompletion without updating the
canonical conversation state; update this block to record the user and assistant
turns in the same structures the rest of the class uses: add the user message to
self.messages (matching the message dict shape used elsewhere), increment
turn_count, call litellm.acompletion with that message list, then when a
response is received append the assistant turn both to self.messages and to
_conversation_thread (or ensure _conversation_thread entries match the canonical
format), and only then call _speak_sentence; ensure you reference and update the
symbols _conversation_thread, self.messages, and turn_count and keep the
litellm.acompletion/ _speak_sentence flow intact.
- Around line 460-463: Replace the local import of LITELLM_BASE so this branch
uses the canonical shared config: change the import that currently does "from
agents.hapax_daimonion.config import LITELLM_BASE as _base" to import
LITELLM_BASE from shared.config (i.e., use "from shared.config import
LITELLM_BASE as _base") so LITELLM_BASE and MODELS are sourced from the same
place as the rest of the method; ensure the existing use of MODELS["local-fast"]
and assignment to grounded_model remains unchanged in conversation_pipeline.py.

In `@agents/hapax_daimonion/cpal/runner.py`:
- Around line 985-992: When the pipeline handles a transcript via
self._pipeline.process_utterance_from_transcript(...) you must record a response
speech event the same way the other response path does so dialog-active
suppression and refractory/echo cooldown logic see the reply; after the await
returns (inside the try block) set self._last_speech_end to the current time and
append a response event into self._recent_speech_events (matching the same event
shape used elsewhere: timestamp and a marker that this was a response speech) so
downstream suppression logic treats this transcript-routed path as having
spoken; do this inside the try before returning and keep the existing
exception/logging behavior for the except branch.
- Around line 1392-1404: The code currently stamps `_last_speech_end` and
appends a SpeechEvent(EXPLORATION) in the `finally` block regardless of whether
playback actually occurred; change this so the timestamp/update and
`_recent_speech_events.append(SpeechEvent(...))` happen only after successful
playback (i.e., when `_buffer.set_speaking(False)` is executed following actual
audio output and the utterance was not dropped/silenced). Move the
`_last_speech_end = time.monotonic()` and the `SpeechEvent` append out of the
`finally` and into the success path (after whatever method confirms playback
finished or that `effect.narrative` produced audible output), keep the exception
logging (`log.debug(..., exc_info=True)`) in the `except` block, and ensure
`_buffer.set_speaking(False)` still runs in `finally` to clear state even on
failure; reference symbols: `_buffer.set_speaking`, `_last_speech_end`,
`_recent_speech_events`, `SpeechEvent`, `SpeechEventKind.EXPLORATION`,
`effect.narrative`, and `"exploration_surfacing"`.
- Around line 507-515: Replace the hardcoded vad_confidence=0.95 in the call to
self._audio_perception.process_utterance with the real computed vad_prob (the
variable computed earlier), i.e. pass vad_confidence=vad_prob; ensure vad_prob
is in scope at this branch (or use a sensible fallback if necessary) so
_audio_perception.process_utterance receives the actual VAD probability rather
than a constant.
- Around line 980-984: Replace the INFO-level log that prints live operator
speech so transcripts are not exposed: in the log.info call that references
impingement and transcript (the call using getattr(impingement, "strength", 0)
and transcript), either change it to log.debug or redact the transcript (e.g.,
log a placeholder like "<REDACTED>" or only the transcript length/hash) while
keeping the strength value; ensure any transcript content is never logged at
INFO level and move any sensitive debug output to DEBUG-only logging.

In `@agents/hapax_daimonion/model_router.py`:
- Around line 57-59: The current mapping in the router collapses ModelTier.FAST,
ModelTier.STRONG, and ModelTier.CAPABLE all to the same "local-fast" route which
disables capability escalation and governance/tooling paths; update the
tier->route mapping in model_router.py so FAST continues to map to "local-fast"
while STRONG and CAPABLE map to their intended higher-capability routes (e.g.,
"local-strong"/"remote-capable" or whatever route names are used in your routing
table), ensuring any governance/tooling-specific route keys used elsewhere
(search for ModelTier, the tier->route dict, and any code that branches on those
route names) are preserved and exercised by STRONG/CAPABLE mappings.

In `@agents/hapax_daimonion/run_inner.py`:
- Around line 460-465: The loop over other_imps incorrectly matches exploration
sources by the exact literal "exploration", so variants like
"exploration.apperception" still get processed; update the suppression inside
the for imp in other_imps loop (the conditional that checks operator_imps and
inspects getattr(imp, "source", "")) to treat any source in the exploration
family as a match (e.g., use startswith("exploration") or split on '.' and
compare the first segment) while still preserving the existing check for
"endogenous.narrative_drive" so those are skipped when operator_imps is present.

In `@docs/superpowers/plans/2026-05-23-impingement-native-voice.md`:
- Line 13: The document jumps from the top-level "#" heading to "### Task 1:
AudioPerceptionBackend — Impingement Emitter", which violates the MD001 heading
increment rule; change the "### Task 1: AudioPerceptionBackend — Impingement
Emitter" heading to "## Task 1: AudioPerceptionBackend — Impingement Emitter"
(or insert an intermediate "##" section before it) so headings increment
properly and markdown lint passes.

In `@shared/config.py`:
- Line 42: The config default for OLLAMA_URL is currently a private LAN IP which
breaks setups without that host; change the default to a portable localhost URL
by updating the OLLAMA_URL constant (the variable name OLLAMA_URL in
shared/config.py) to use "http://localhost:11434" while still allowing overrides
via os.environ.get("OLLAMA_HOST", ...); ensure only the default value is changed
and environment variable behavior remains the same.

---

Outside diff comments:
In `@agents/hapax_daimonion/conversation_pipeline.py`:
- Around line 366-379: _update_system_context() mutates the live system content
in self.messages[0]["content"], but the spontaneous-speech phase is currently
sending the static self.system_prompt instead, losing updated context; change
the messages construction to use the refreshed system entry (e.g., combine
spontaneous_envelope with self.messages[0]["content"] or replace the "content"
field with f"{spontaneous_envelope}\n\n{self.messages[0]['content']}") so the
spontaneous_envelope and subsequent messages use the live
conversation/policy/salience grounding produced by _update_system_context(),
keeping render_envelope, spontaneous_envelope, and the messages list semantics
intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: f640afa4-269c-421e-a5ed-80d0c040a489

📥 Commits

Reviewing files that changed from the base of the PR and between 5141e11 and cc6cff9.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (18)
  • agents/hapax_daimonion/audio_input.py
  • agents/hapax_daimonion/audio_perception.py
  • agents/hapax_daimonion/config.py
  • agents/hapax_daimonion/conversation_buffer.py
  • agents/hapax_daimonion/conversation_pipeline.py
  • agents/hapax_daimonion/cpal/runner.py
  • agents/hapax_daimonion/model_router.py
  • agents/hapax_daimonion/resident_stt.py
  • agents/hapax_daimonion/run_inner.py
  • agents/hapax_daimonion/run_loops_aux.py
  • agents/hapax_daimonion/session_events.py
  • docs/superpowers/plans/2026-05-23-impingement-native-voice.md
  • docs/superpowers/specs/2026-05-23-impingement-native-voice-design.md
  • pyproject.toml
  • shared/config.py
  • systemd/units/hapax-daimonion.service.d/gpu-pin.conf
  • tests/hapax_daimonion/test_audio_perception.py
  • tests/hapax_daimonion/test_cpal_runner.py

# Backends
backend: str = "local" # "local" or "gemini"
llm_model: str = "gemini-flash"
llm_model: str = "local-fast"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Centralize the LLM default alias through shared config

This hardcoded alias in daemon-local config can drift from centralized route aliases. Please source this default from shared/config.py (or a small shared alias module) instead of keeping a local literal.

As per coding guidelines, **/*.py: Use shared/config.py for model aliases and LiteLLM/Qdrant configuration.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agents/hapax_daimonion/config.py` at line 107, Replace the hardcoded
llm_model default with the centralized alias from shared.config: import the
shared alias constant (for example DEFAULT_LLM_ALIAS) from shared.config and set
llm_model: str = DEFAULT_LLM_ALIAS in agents/hapax_daimonion/config.py (keep the
llm_model symbol and its type hint unchanged); update any module imports
accordingly so the default is sourced from shared/config.py rather than the
local literal.

Comment thread agents/hapax_daimonion/conversation_pipeline.py Outdated
Comment on lines +463 to +484
grounded_model = MODELS["local-fast"]
self._conversation_thread.append({"role": "user", "content": transcript})

try:
response = await asyncio.wait_for(
litellm.acompletion(
model=f"openai/{grounded_model}",
messages=self.messages + [{"role": "user", "content": transcript}],
max_tokens=200,
temperature=0.7,
api_key=__import__("os").environ.get("LITELLM_API_KEY", ""),
api_base=_base,
timeout=10,
),
timeout=15,
)

text = response.choices[0].message.content.strip()
if text:
log.info("Conversation response: %s", text[:80])
self._conversation_thread.append({"role": "assistant", "content": text})
await self._speak_sentence(text)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

This transcript path bypasses the pipeline’s normal conversation state updates.

It appends raw dicts to _conversation_thread and never records the user/assistant turns in self.messages or increments turn_count. The rest of the class rebuilds prompt state from self.messages and treats _conversation_thread as structured thread entries, so impingement-routed turns become effectively stateless and can break later thread rendering.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agents/hapax_daimonion/conversation_pipeline.py` around lines 463 - 484, The
transcript path currently mutates _conversation_thread directly and calls
litellm.acompletion without updating the canonical conversation state; update
this block to record the user and assistant turns in the same structures the
rest of the class uses: add the user message to self.messages (matching the
message dict shape used elsewhere), increment turn_count, call
litellm.acompletion with that message list, then when a response is received
append the assistant turn both to self.messages and to _conversation_thread (or
ensure _conversation_thread entries match the canonical format), and only then
call _speak_sentence; ensure you reference and update the symbols
_conversation_thread, self.messages, and turn_count and keep the
litellm.acompletion/ _speak_sentence flow intact.

Comment on lines +507 to +515
if utterance is not None:
if self._audio_perception is not None:
asyncio.create_task(
self._audio_perception.process_utterance(
audio_bytes=utterance,
vad_confidence=0.95,
duration_s=len(utterance) / (16000 * 2),
energy_db=-12.0,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pass the real VAD confidence into audio_perception.

vad_prob is computed a few lines above, but this path hardcodes vad_confidence=0.95. That makes operator-speech impingement strength almost constant and weakens the salience ordering this PR is trying to introduce.

Suggested fix
                     self._audio_perception.process_utterance(
                         audio_bytes=utterance,
-                        vad_confidence=0.95,
+                        vad_confidence=vad_prob,
                         duration_s=len(utterance) / (16000 * 2),
                         energy_db=-12.0,
                     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if utterance is not None:
if self._audio_perception is not None:
asyncio.create_task(
self._audio_perception.process_utterance(
audio_bytes=utterance,
vad_confidence=0.95,
duration_s=len(utterance) / (16000 * 2),
energy_db=-12.0,
)
if utterance is not None:
if self._audio_perception is not None:
asyncio.create_task(
self._audio_perception.process_utterance(
audio_bytes=utterance,
vad_confidence=vad_prob,
duration_s=len(utterance) / (16000 * 2),
energy_db=-12.0,
)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agents/hapax_daimonion/cpal/runner.py` around lines 507 - 515, Replace the
hardcoded vad_confidence=0.95 in the call to
self._audio_perception.process_utterance with the real computed vad_prob (the
variable computed earlier), i.e. pass vad_confidence=vad_prob; ensure vad_prob
is in scope at this branch (or use a sensible fallback if necessary) so
_audio_perception.process_utterance receives the actual VAD probability rather
than a constant.

Comment on lines +980 to +984
log.info(
"CPAL: operator speech impingement (strength=%.2f): %.40s",
getattr(impingement, "strength", 0),
transcript,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid logging operator transcript text at INFO.

This path now writes live user speech into normal service logs. Even a 40-character preview can contain secrets or PII, so keep transcript content redacted or DEBUG-only.

Suggested fix
-                log.info(
-                    "CPAL: operator speech impingement (strength=%.2f): %.40s",
-                    getattr(impingement, "strength", 0),
-                    transcript,
-                )
+                log.info(
+                    "CPAL: operator speech impingement (strength=%.2f)",
+                    getattr(impingement, "strength", 0),
+                )
+                log.debug("CPAL: operator speech transcript=%r", transcript)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agents/hapax_daimonion/cpal/runner.py` around lines 980 - 984, Replace the
INFO-level log that prints live operator speech so transcripts are not exposed:
in the log.info call that references impingement and transcript (the call using
getattr(impingement, "strength", 0) and transcript), either change it to
log.debug or redact the transcript (e.g., log a placeholder like "<REDACTED>" or
only the transcript length/hash) while keeping the strength value; ensure any
transcript content is never logged at INFO level and move any sensitive debug
output to DEBUG-only logging.

Comment on lines +1392 to +1404
except Exception:
log.debug("Exploration speech failed", exc_info=True)
finally:
self._buffer.set_speaking(False)
self._last_speech_end = time.monotonic()
self._recent_speech_events.append(
SpeechEvent(
kind=SpeechEventKind.EXPLORATION,
timestamp=self._last_speech_end,
source_path="exploration_surfacing",
text_preview=effect.narrative[:40],
)
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Only stamp exploration speech after actual playback.

The finally block always updates _last_speech_end and appends an EXPLORATION event, even when generation failed, routing dropped the utterance, or the LLM chose silence. That creates false refractory suppression for later impingements.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agents/hapax_daimonion/cpal/runner.py` around lines 1392 - 1404, The code
currently stamps `_last_speech_end` and appends a SpeechEvent(EXPLORATION) in
the `finally` block regardless of whether playback actually occurred; change
this so the timestamp/update and
`_recent_speech_events.append(SpeechEvent(...))` happen only after successful
playback (i.e., when `_buffer.set_speaking(False)` is executed following actual
audio output and the utterance was not dropped/silenced). Move the
`_last_speech_end = time.monotonic()` and the `SpeechEvent` append out of the
`finally` and into the success path (after whatever method confirms playback
finished or that `effect.narrative` produced audible output), keep the exception
logging (`log.debug(..., exc_info=True)`) in the `except` block, and ensure
`_buffer.set_speaking(False)` still runs in `finally` to clear state even on
failure; reference symbols: `_buffer.set_speaking`, `_last_speech_end`,
`_recent_speech_events`, `SpeechEvent`, `SpeechEventKind.EXPLORATION`,
`effect.narrative`, and `"exploration_surfacing"`.

Comment on lines +57 to +59
ModelTier.FAST: "local-fast",
ModelTier.STRONG: "local-fast",
ModelTier.CAPABLE: "local-fast",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Tier-route collapse removes capability escalation

Mapping FAST/STRONG/CAPABLE to the same local route defeats the router’s escalation logic, including governance/tooling paths that are supposed to use more capable models.

Suggested fix
 TIER_ROUTES: dict[ModelTier, str] = {
     ModelTier.CANNED: "",  # no LLM call
     ModelTier.LOCAL: "local-fast",
-    ModelTier.FAST: "local-fast",
-    ModelTier.STRONG: "local-fast",
-    ModelTier.CAPABLE: "local-fast",
+    ModelTier.FAST: "gemini-flash",
+    ModelTier.STRONG: "claude-sonnet",
+    ModelTier.CAPABLE: "claude-opus",
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agents/hapax_daimonion/model_router.py` around lines 57 - 59, The current
mapping in the router collapses ModelTier.FAST, ModelTier.STRONG, and
ModelTier.CAPABLE all to the same "local-fast" route which disables capability
escalation and governance/tooling paths; update the tier->route mapping in
model_router.py so FAST continues to map to "local-fast" while STRONG and
CAPABLE map to their intended higher-capability routes (e.g.,
"local-strong"/"remote-capable" or whatever route names are used in your routing
table), ensuring any governance/tooling-specific route keys used elsewhere
(search for ModelTier, the tier->route dict, and any code that branches on those
route names) are preserved and exercised by STRONG/CAPABLE mappings.

Comment on lines +460 to +465
for imp in other_imps:
# Skip if operator speech was just processed (it preempted)
if operator_imps:
source = getattr(imp, "source", "")
if source in ("exploration", "endogenous.narrative_drive"):
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Match exploration sources by family, not exact literals.

This suppression only catches "exploration", but the supplied CPAL tests already use "exploration.apperception". That source will still be processed after operator speech, so the preemption path can double-surface exploration right after a live utterance.

Suggested fix
-                        if source in ("exploration", "endogenous.narrative_drive"):
+                        if source.startswith("exploration") or source == "endogenous.narrative_drive":
                             continue
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agents/hapax_daimonion/run_inner.py` around lines 460 - 465, The loop over
other_imps incorrectly matches exploration sources by the exact literal
"exploration", so variants like "exploration.apperception" still get processed;
update the suppression inside the for imp in other_imps loop (the conditional
that checks operator_imps and inspects getattr(imp, "source", "")) to treat any
source in the exploration family as a match (e.g., use startswith("exploration")
or split on '.' and compare the first segment) while still preserving the
existing check for "endogenous.narrative_drive" so those are skipped when
operator_imps is present.


---

### Task 1: AudioPerceptionBackend — Impingement Emitter
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix heading level jump to satisfy markdown lint.

Line 13 starts at ### immediately after the document-level # heading, which violates MD001 heading increment rules.

Suggested diff
-### Task 1: AudioPerceptionBackend — Impingement Emitter
+## Implementation Tasks
+
+### Task 1: AudioPerceptionBackend — Impingement Emitter
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 13-13: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/superpowers/plans/2026-05-23-impingement-native-voice.md` at line 13,
The document jumps from the top-level "#" heading to "### Task 1:
AudioPerceptionBackend — Impingement Emitter", which violates the MD001 heading
increment rule; change the "### Task 1: AudioPerceptionBackend — Impingement
Emitter" heading to "## Task 1: AudioPerceptionBackend — Impingement Emitter"
(or insert an intermediate "##" section before it) so headings increment
properly and markdown lint passes.

Comment thread shared/config.py
)
QDRANT_URL: str = os.environ.get("QDRANT_URL", "http://localhost:6333")
OLLAMA_URL: str = os.environ.get("OLLAMA_HOST", "http://localhost:11434")
OLLAMA_URL: str = os.environ.get("OLLAMA_HOST", "http://192.168.68.50:11434")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid a host-specific Ollama default

Defaulting to a private LAN IP makes embeddings fail in environments that don’t have that host. Keep the default portable (localhost) and rely on env override for host-specific setups.

Suggested fix
-OLLAMA_URL: str = os.environ.get("OLLAMA_HOST", "http://192.168.68.50:11434")
+OLLAMA_URL: str = os.environ.get("OLLAMA_HOST", "http://localhost:11434")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
OLLAMA_URL: str = os.environ.get("OLLAMA_HOST", "http://192.168.68.50:11434")
OLLAMA_URL: str = os.environ.get("OLLAMA_HOST", "http://localhost:11434")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@shared/config.py` at line 42, The config default for OLLAMA_URL is currently
a private LAN IP which breaks setups without that host; change the default to a
portable localhost URL by updating the OLLAMA_URL constant (the variable name
OLLAMA_URL in shared/config.py) to use "http://localhost:11434" while still
allowing overrides via os.environ.get("OLLAMA_HOST", ...); ensure only the
default value is changed and environment variable behavior remains the same.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cc6cff97ed

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

from shared.config import MODELS

grounded_model = MODELS["local-fast"]
self._conversation_thread.append({"role": "user", "content": transcript})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Append ThreadEntry objects in transcript path

process_utterance_from_transcript pushes plain dicts into _conversation_thread, but _update_system_context later passes that list to _render_thread, which dereferences ThreadEntry attributes like e.is_repair/e.user_text. After the first operator-impingement turn, the next context refresh raises AttributeError before the LLM call, so subsequent operator speech in this path gets dropped.

Useful? React with 👍 / 👎.

Comment thread shared/config.py
)
QDRANT_URL: str = os.environ.get("QDRANT_URL", "http://localhost:6333")
OLLAMA_URL: str = os.environ.get("OLLAMA_HOST", "http://localhost:11434")
OLLAMA_URL: str = os.environ.get("OLLAMA_HOST", "http://192.168.68.50:11434")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep Ollama fallback host local by default

This changes the default OLLAMA_URL to a machine-specific LAN IP when OLLAMA_HOST is unset. In environments that rely on defaults, embedding calls will attempt that external host and fail, which breaks local embedding-dependent flows (the same module’s Ollama client builder reads OLLAMA_URL). This is a portability regression compared with the repo’s other localhost defaults.

Useful? React with 👍 / 👎.

Comment on lines +482 to +484
log.info("Conversation response: %s", text[:80])
self._conversation_thread.append({"role": "assistant", "content": text})
await self._speak_sentence(text)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Mark transcript-path TTS as speaking

The new operator-impingement path speaks via _speak_sentence without ever setting buffer.set_speaking(True), unlike the normal utterance path which enables the speaking gate before response playback. Because audio ingestion is always-on, this allows Hapax’s own TTS audio to be re-captured as fresh speech during these responses, creating false operator impingements and self-trigger loops under active playback.

Useful? React with 👍 / 👎.

Comment on lines 107 to 115
if text:
_level = log.debug if speculative else log.info
_level(
'STT: "%s" (%.1fs audio, lang=%s)',
"STT: \"%s\" (%.1fs audio)",
text,
len(audio) / sample_rate,
info.language,
)

if not speculative:
self._extract_prosody(audio, sample_rate, all_words)

return text
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restore prosody publication after STT transcription

This refactor removed the only call site that writes prosody features after successful transcription. _extract_prosody remains defined but is now dead code, so /dev/shm prosody state is never refreshed from new utterances; components that inject speech prosody into context receive stale/empty data, reducing behavior quality tied to operator tone.

Useful? React with 👍 / 👎.

Comment on lines +945 to +947
if self._audio_output is not None and hasattr(self._audio_output, "kill"):
self._audio_output.kill()
log.info("CPAL: killed in-flight playback for preemption")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Implement real playback cancellation for preemption

The new preemption path calls _kill_inflight_playback, but it only invokes self._audio_output.kill() when that method exists. The shipped PwAudioOutput has no kill method, so this branch never cancels active playback; it only clears the speaking flag. In practice, operator speech cannot reliably interrupt an in-flight exploration/autonomous clip, which breaks the intended preemption behavior.

Useful? React with 👍 / 👎.

Comment on lines +464 to +465
if source in ("exploration", "endogenous.narrative_drive"):
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Skip autonomous narration after operator preemption

When operator speech is present in a poll batch, this filter only skips exploration and endogenous.narrative_drive impingements; autonomous_narrative continues to be processed in the same cycle. Because operator handling does not set _processing_utterance or acquire the speech lock, the autonomous path can still synthesize/play immediately after the operator claim, causing operator preemption to be undermined by overlapping follow-on narration.

Useful? React with 👍 / 👎.

@github-actions
Copy link
Copy Markdown

Auto-fix classified CI failure categories as lint. Privileged workflow auto-mutation is disabled; route this through governed remediation.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@agents/hapax_daimonion/tts_serve_remote.py`:
- Line 16: Remove the unused imports causing Ruff F401 by deleting the top-level
imports of struct and sys from tts_serve_remote.py; search for any references to
the symbols "struct" and "sys" in the file (e.g., in functions like any audio
packing/CLI handling) and only restore/import them if actually used, then re-run
the linter to confirm the F401 errors are resolved.
- Around line 82-90: Move the environment setup so HAPAX_CHATTERBOX_DEVICE is
set before importing agents.hapax_daimonion.tts: ensure
os.environ.setdefault("HAPAX_CHATTERBOX_DEVICE", device) runs prior to the
import and instantiation of TTSManager (the current TTSManager import and tts =
TTSManager() must occur after setting the env var) so the module-level
resolution in agents.hapax_daimonion.tts sees the correct device passed to
serve(device=...).

In `@agents/hapax_daimonion/tts.py`:
- Around line 140-170: The _synthesize_remote method can return truncated PCM
when the remote closes early; change the receiving logic so after the read loop
it verifies len(pcm) == pcm_len and if not raises a RuntimeError (e.g., "Remote
TTS truncated payload") instead of returning partial data; keep existing socket
closing in the finally block, use the existing symbols pcm_len, pcm, sock and
the recv loop in _synthesize_remote, and ensure the error includes contextual
info (remote host/port and expected vs received lengths).
- Around line 57-58: TTSManager's constructor currently converts
HAPAX_TTS_REMOTE_PORT with int(...) which will raise on non-numeric values and
prevent fallback; change the initialization of self._remote_port to parse the
environment variable defensively: read env via
os.environ.get("HAPAX_TTS_REMOTE_PORT"), attempt int(...) inside a try/except
(catch ValueError and TypeError), and on failure set the port to the default
9851 (and optionally log a warning). Update references to _remote_port
accordingly so invalid env values no longer crash initialization.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 841a2f0c-60b8-435d-8e3b-314727fea381

📥 Commits

Reviewing files that changed from the base of the PR and between cc6cff9 and 40a551f.

📒 Files selected for processing (3)
  • agents/hapax_daimonion/tts.py
  • agents/hapax_daimonion/tts_serve_remote.py
  • systemd/units/hapax-tts-remote.service

import asyncio
import json
import logging
import struct
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove unused imports blocking lint CI.

struct and sys are unused and currently fail Ruff (F401), which is already breaking the pipeline.

Proposed fix
 import asyncio
 import json
 import logging
-import struct
@@
 if __name__ == "__main__":
     import os
-    import sys

Also applies to: 107-107

🧰 Tools
🪛 GitHub Actions: CI / 2_lint.txt

[error] 16-16: ruff check failed (F401): struct imported but unused

🪛 GitHub Actions: CI / lint

[error] 16-16: Ruff (check): F401 struct imported but unused

🪛 GitHub Check: lint

[failure] 16-16: ruff (F401)
agents/hapax_daimonion/tts_serve_remote.py:16:8: F401 struct imported but unused
help: Remove unused import: struct

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agents/hapax_daimonion/tts_serve_remote.py` at line 16, Remove the unused
imports causing Ruff F401 by deleting the top-level imports of struct and sys
from tts_serve_remote.py; search for any references to the symbols "struct" and
"sys" in the file (e.g., in functions like any audio packing/CLI handling) and
only restore/import them if actually used, then re-run the linter to confirm the
F401 errors are resolved.

Comment on lines +82 to +90
from agents.hapax_daimonion.tts import TTSManager

tts = TTSManager()
log.info("Loading Chatterbox on %s...", device)

import os

os.environ.setdefault("HAPAX_CHATTERBOX_DEVICE", device)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Set device env before importing TTSManager.

device is configured after importing agents.hapax_daimonion.tts, but that module resolves _CHATTERBOX_DEVICE at import time. This can silently ignore the serve(device=...) argument.

Proposed fix
 async def serve(port: int = DEFAULT_PORT, device: str = "cuda:0") -> None:
-    from agents.hapax_daimonion.tts import TTSManager
-
-    tts = TTSManager()
-    log.info("Loading Chatterbox on %s...", device)
-
     import os
 
     os.environ.setdefault("HAPAX_CHATTERBOX_DEVICE", device)
+    from agents.hapax_daimonion.tts import TTSManager
+
+    tts = TTSManager()
+    log.info("Loading Chatterbox on %s...", device)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from agents.hapax_daimonion.tts import TTSManager
tts = TTSManager()
log.info("Loading Chatterbox on %s...", device)
import os
os.environ.setdefault("HAPAX_CHATTERBOX_DEVICE", device)
async def serve(port: int = DEFAULT_PORT, device: str = "cuda:0") -> None:
import os
os.environ.setdefault("HAPAX_CHATTERBOX_DEVICE", device)
from agents.hapax_daimonion.tts import TTSManager
tts = TTSManager()
log.info("Loading Chatterbox on %s...", device)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agents/hapax_daimonion/tts_serve_remote.py` around lines 82 - 90, Move the
environment setup so HAPAX_CHATTERBOX_DEVICE is set before importing
agents.hapax_daimonion.tts: ensure
os.environ.setdefault("HAPAX_CHATTERBOX_DEVICE", device) runs prior to the
import and instantiation of TTSManager (the current TTSManager import and tts =
TTSManager() must occur after setting the env var) so the module-level
resolution in agents.hapax_daimonion.tts sees the correct device passed to
serve(device=...).

Comment on lines +57 to +58
self._remote_host = os.environ.get("HAPAX_TTS_REMOTE_HOST")
self._remote_port = int(os.environ.get("HAPAX_TTS_REMOTE_PORT", "9851"))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard remote port parsing to prevent init-time crashes.

A non-numeric HAPAX_TTS_REMOTE_PORT will raise during TTSManager construction and bypass all fallback behavior. Default safely when parsing fails.

Proposed fix
         self._backend = "chatterbox"
         self._remote_host = os.environ.get("HAPAX_TTS_REMOTE_HOST")
-        self._remote_port = int(os.environ.get("HAPAX_TTS_REMOTE_PORT", "9851"))
+        raw_remote_port = os.environ.get("HAPAX_TTS_REMOTE_PORT", "9851")
+        try:
+            self._remote_port = int(raw_remote_port)
+        except ValueError:
+            log.warning(
+                "Invalid HAPAX_TTS_REMOTE_PORT=%r; defaulting to 9851",
+                raw_remote_port,
+            )
+            self._remote_port = 9851
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agents/hapax_daimonion/tts.py` around lines 57 - 58, TTSManager's constructor
currently converts HAPAX_TTS_REMOTE_PORT with int(...) which will raise on
non-numeric values and prevent fallback; change the initialization of
self._remote_port to parse the environment variable defensively: read env via
os.environ.get("HAPAX_TTS_REMOTE_PORT"), attempt int(...) inside a try/except
(catch ValueError and TypeError), and on failure set the port to the default
9851 (and optionally log a warning). Update references to _remote_port
accordingly so invalid env values no longer crash initialization.

Comment on lines +140 to +170
def _synthesize_remote(self, text: str, use_case: str = "conversation") -> bytes:
import json
import socket

request = json.dumps({"text": text, "use_case": use_case}) + "\n"
sock = socket.create_connection((self._remote_host, self._remote_port), timeout=60)
try:
sock.sendall(request.encode())
buf = b""
while b"\n" not in buf:
chunk = sock.recv(4096)
if not chunk:
raise RuntimeError("Remote TTS connection closed before header")
buf += chunk
header_line, remainder = buf.split(b"\n", 1)
header = json.loads(header_line)
if header.get("status") != "ok":
raise RuntimeError(f"Remote TTS error: {header.get('error', 'unknown')}")
pcm_len = header["pcm_len"]
pcm = remainder
while len(pcm) < pcm_len:
chunk = sock.recv(min(65536, pcm_len - len(pcm)))
if not chunk:
break
pcm += chunk
log.info(
"Remote TTS: %d bytes from %s:%d", len(pcm), self._remote_host, self._remote_port
)
return pcm
finally:
sock.close()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail on truncated PCM payloads instead of returning partial audio.

If the remote side closes early, the loop exits and returns undersized PCM. That silently corrupts output and violates the declared pcm_len framing.

Proposed fix
             pcm_len = header["pcm_len"]
             pcm = remainder
             while len(pcm) < pcm_len:
                 chunk = sock.recv(min(65536, pcm_len - len(pcm)))
                 if not chunk:
-                    break
+                    raise RuntimeError(
+                        f"Remote TTS truncated payload: expected {pcm_len} bytes, got {len(pcm)}"
+                    )
                 pcm += chunk
+            if len(pcm) > pcm_len:
+                pcm = pcm[:pcm_len]
             log.info(
                 "Remote TTS: %d bytes from %s:%d", len(pcm), self._remote_host, self._remote_port
             )
             return pcm
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agents/hapax_daimonion/tts.py` around lines 140 - 170, The _synthesize_remote
method can return truncated PCM when the remote closes early; change the
receiving logic so after the read loop it verifies len(pcm) == pcm_len and if
not raises a RuntimeError (e.g., "Remote TTS truncated payload") instead of
returning partial data; keep existing socket closing in the finally block, use
the existing symbols pcm_len, pcm, sock and the recv loop in _synthesize_remote,
and ensure the error includes contextual info (remote host/port and expected vs
received lengths).

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 40a551fc98

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

messages=messages,
max_tokens=80,
temperature=0.7,
api_key=__import__("os").environ.get("LITELLM_API_KEY", ""),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep LiteLLM API key fallback non-empty

These new litellm.acompletion calls use os.environ.get("LITELLM_API_KEY", ""), which sends an empty API key when the env var is unset. In this codebase, the main conversation path intentionally falls back to "not-set" to satisfy LiteLLM gateway auth expectations; using "" here can cause 401/auth failures specifically on spontaneous speech and transcript-driven operator responses in environments that rely on the fallback behavior.

Useful? React with 👍 / 👎.

Comment on lines +162 to +163
if not chunk:
break
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Raise on incomplete remote TTS PCM reads

The remote TTS client stops reading when recv() returns empty but then returns whatever bytes were received, even if shorter than the advertised pcm_len. If the socket closes mid-transfer (network hiccup, server restart), this path silently returns truncated PCM instead of failing over, which can produce corrupted or clipped playback while appearing successful.

Useful? React with 👍 / 👎.

Comment on lines +509 to +513
asyncio.create_task(
self._audio_perception.process_utterance(
audio_bytes=utterance,
vad_confidence=0.95,
duration_s=len(utterance) / (16000 * 2),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Serialize audio-perception utterance processing

This now launches process_utterance as an unconstrained background task for every detected utterance, with no in-flight guard or queueing. Under sustained speech, STT calls can pile up and execute concurrently, which can overwhelm the single STT backend and emit delayed/out-of-order operator impingements compared with arrival order; the prior path only processed one utterance at a time and queued the next.

Useful? React with 👍 / 👎.

response = await asyncio.wait_for(
litellm.acompletion(
model=f"openai/{grounded_model}",
messages=self.messages + [{"role": "user", "content": transcript}],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Persist transcript-path turns in message history

This transcript-only path sends the user utterance in the request, but it never appends either the user turn or assistant reply to self.messages. As a result, subsequent transcript-driven turns lose immediate conversational context (and tool-call continuity) even within the same session, unlike the main process_utterance flow that persists both roles to message history.

Useful? React with 👍 / 👎.

asyncio.create_task(
self._audio_perception.process_utterance(
audio_bytes=utterance,
vad_confidence=0.95,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use measured VAD confidence for speech impingements

The CPAL tick now forwards a hardcoded vad_confidence=0.95 for every utterance, so downstream strength scoring treats all detected clips as near-certain speech regardless of actual VAD probability. Because operator impingement strength is vad_confidence * speaker_confidence, this inflates low-quality/ambiguous audio into high-priority operator events and can trigger unnecessary preemption of ongoing exploration output.

Useful? React with 👍 / 👎.

TTS:
- Kokoro 82M on GPU as primary (40ms synthesis, 312MB VRAM)
- Chatterbox remote server on appendix as fallback
- HAPAX_TTS_BACKEND env var selects primary backend
- Voice identity lives in S-4 hardware effects, not TTS model

Conversation:
- Exploration surfacing and autonomous narration disabled
- Compositor director speech disabled
- _speak_sentence fires as background task (non-blocking)
- _update_system_context skipped for speed
- Direct TabbyAPI call (bypasses LiteLLM proxy overhead)
- Message alternation fix for Command-R
- LLM call timing instrumentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Auto-fix classified CI failure categories as lint. Privileged workflow auto-mutation is disabled; route this through governed remediation.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
agents/hapax_daimonion/conversation_pipeline.py (1)

401-416: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Initialize _llm_start before logging elapsed time.

Line 415 reads _llm_start, but this branch never sets it. Any successful spontaneous-speech LLM response will raise here and drop the utterance.

Suggested fix
             with metrics_ctx:
+                _llm_start = time.monotonic()
                 log.info("spontaneous_speech: LLM call start (model=openai/%s)", grounded_model)
                 response = await asyncio.wait_for(
                     litellm.acompletion(
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agents/hapax_daimonion/conversation_pipeline.py` around lines 401 - 416, The
branch that calls litellm.acompletion uses _llm_start when computing elapsed
time but never initializes it; set _llm_start = time.monotonic() (or
__import__("time").monotonic()) immediately before logging "spontaneous_speech:
LLM call start..." and before awaiting litellm.acompletion so that the
subsequent computation (_llm_elapsed = __import__("time").monotonic() -
_llm_start) is valid; ensure the same initialization is present for this
spontaneous-speech branch around the litellm.acompletion call and that the
log.info("LLM call took...") remains unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@agents/hapax_daimonion/cpal/runner.py`:
- Around line 1246-1248: The early return on effect.should_surface prevents any
exploration routing, arbiter claiming, or background speech tasks from ever
running; remove the unconditional "return" when effect.should_surface is true in
the block handling the Effect object (refer to effect.should_surface) and
instead let the normal exploration path execute: perform routing/arbiter claim
logic and schedule the background speech task (the same flow used when
exploration is allowed) while ensuring the existing “operator speech preempts
exploration” and “speak during exploration → exploration interrupted” acceptance
checks still run after scheduling; update any surrounding comments to reflect
that effect.should_surface enables the exploration-surface code path rather than
short-circuiting it.

---

Outside diff comments:
In `@agents/hapax_daimonion/conversation_pipeline.py`:
- Around line 401-416: The branch that calls litellm.acompletion uses _llm_start
when computing elapsed time but never initializes it; set _llm_start =
time.monotonic() (or __import__("time").monotonic()) immediately before logging
"spontaneous_speech: LLM call start..." and before awaiting litellm.acompletion
so that the subsequent computation (_llm_elapsed =
__import__("time").monotonic() - _llm_start) is valid; ensure the same
initialization is present for this spontaneous-speech branch around the
litellm.acompletion call and that the log.info("LLM call took...") remains
unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: ac6a47e7-faea-40db-87c2-02b95a3fe434

📥 Commits

Reviewing files that changed from the base of the PR and between 40a551f and f4be83a.

📒 Files selected for processing (4)
  • agents/hapax_daimonion/conversation_pipeline.py
  • agents/hapax_daimonion/cpal/runner.py
  • agents/hapax_daimonion/run_loops_aux.py
  • agents/hapax_daimonion/tts.py

Comment on lines 1246 to 1248
if effect.should_surface:
return # Exploration surfacing disabled — conversation-first architecture
# Refractory inhibition for exploration surfacing.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

This guard disables the exploration path the PR is supposed to preempt.

Lines 1246-1247 return before any routing, arbiter claim, or background speech task runs, so the acceptance cases “operator speech preempts exploration” and “speak during exploration → exploration interrupted” cannot happen in this build.

Suggested fix
-        if effect.should_surface:
-            return  # Exploration surfacing disabled — conversation-first architecture
+        if effect.should_surface:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if effect.should_surface:
return # Exploration surfacing disabled — conversation-first architecture
# Refractory inhibition for exploration surfacing.
if effect.should_surface:
# Refractory inhibition for exploration surfacing.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agents/hapax_daimonion/cpal/runner.py` around lines 1246 - 1248, The early
return on effect.should_surface prevents any exploration routing, arbiter
claiming, or background speech tasks from ever running; remove the unconditional
"return" when effect.should_surface is true in the block handling the Effect
object (refer to effect.should_surface) and instead let the normal exploration
path execute: perform routing/arbiter claim logic and schedule the background
speech task (the same flow used when exploration is allowed) while ensuring the
existing “operator speech preempts exploration” and “speak during exploration →
exploration interrupted” acceptance checks still run after scheduling; update
any surrounding comments to reflect that effect.should_surface enables the
exploration-surface code path rather than short-circuiting it.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f4be83ab55

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

)

text = response.choices[0].message.content.strip()
_llm_elapsed = __import__("time").monotonic() - _llm_start
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Define LLM timing start before spontaneous speech call

generate_spontaneous_speech computes _llm_elapsed using _llm_start, but this variable is never assigned in the function, so every successful completion path raises NameError at runtime and drops spontaneous speech into the exception handler. This makes exploration/impingement surfacing fail even when the LLM call itself succeeds.

Useful? React with 👍 / 👎.

from shared.config import MODELS

grounded_model = MODELS["local-fast"]
self._conversation_thread.append({"role": "user", "content": transcript})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reuse utterance safety filters in transcript conversation path

This new transcript path sends STT text directly to generation without going through _process_utterance_inner, so it skips the existing echo/strip safeguards (_is_echo, _strip_echo_prefix, duplicate rejection) that protect against self-capture. In the same commit the speaking gates were relaxed, so recaptured TTS transcripts can now be treated as operator input and trigger self-reinforcing turns.

Useful? React with 👍 / 👎.

try:
import httpx

_tabby_url = "http://localhost:5000/v1/chat/completions"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use configured LLM base URL for transcript requests

The transcript-driven path hardcodes http://localhost:5000/v1/chat/completions instead of using configured routing (LITELLM_BASE/model settings), so deployments that rely on non-local or overridden endpoints will fail only on this path while normal conversation continues to work. This creates a production-only split-brain behavior that's hard to diagnose.

Useful? React with 👍 / 👎.

Comment on lines +1405 to +1410
async with self._speech_lock:
try:
await self._pipeline.generate_spontaneous_speech(
impingement,
register_hint=register_hint,
destination_target=destination_target,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Set speaking gate while exploration speech is synthesizing

The background exploration path now calls generate_spontaneous_speech without ever setting buffer.set_speaking(True), but it still clears the flag in finally. With this commit’s always-on VAD behavior, exploration playback can be re-captured as fresh speech and fed back into the impingement loop, causing false speech events and self-trigger churn during active narration.

Useful? React with 👍 / 👎.

resp = await client.post(
_tabby_url,
json={
"model": "command-r",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Respect configured model route in transcript LLM calls

This transcript path hardcodes "model": "command-r" instead of using the configured model alias/router (local-fast / MODELS / daemon config). Environments that override model IDs or route names will break specifically for transcript-driven turns, even though the primary conversation path remains configurable.

Useful? React with 👍 / 👎.

return

if effect.should_surface:
return # Exploration surfacing disabled — conversation-first architecture
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove unconditional return that blocks exploration surfacing

In process_impingement, if effect.should_surface: immediately returns, which makes the entire exploration surfacing branch below unreachable (refractory check, destination decision, arbiter claim, and pipeline speech task). As a result, surfaced exploration impingements are always dropped instead of being deferred/preempted as intended.

Useful? React with 👍 / 👎.

"temperature": 0.7,
},
)
response = resp.json()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Raise on non-2xx transcript LLM responses

The transcript path parses resp.json() without checking HTTP status, so 4xx/5xx responses are treated as normal payloads and then silently dropped by the generic exception/choices checks. This hides transport/auth failures and causes operator utterances to be ignored without a clear failure signal.

Useful? React with 👍 / 👎.

… optimization

- Phenomenal context (stimmung/temporal/perception) injected from /dev/shm (~15ms)
- Conversation history with proper alternation (last 10 turns)
- Echo rejection via _is_echo() + immediate _recent_tts_texts registration
- httpx client cached (eliminates per-request TCP setup)
- Exploration surfacing + autonomous narration disabled
- Compositor director speech disabled
- Conversation window suppresses exploration for 120s after operator speech

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Auto-fix classified CI failure categories as lint. Privileged workflow auto-mutation is disabled; route this through governed remediation.

- Older conversation turns compressed via LLMLingua-2 at 0.33 rate (~3x token savings)
- Compressor pre-warmed at startup (BERT-base, CPU)
- Echo rejection registers response text immediately (before TTS fires)
- No context excised — full system prompt + phenomenal + compressed history

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Auto-fix classified CI failure categories as lint. Privileged workflow auto-mutation is disabled; route this through governed remediation.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@agents/hapax_daimonion/conversation_pipeline.py`:
- Around line 485-493: The slice of self._conversation_thread used to build
history may contain ThreadEntry objects (see ThreadEntry type and
process_utterance) but the code treats entries as dicts; convert each
ThreadEntry to the LLM message dict shape before using history. In the block
that builds history (referencing _conversation_thread, history, messages,
system_block, transcript) map/transform ThreadEntry instances to {"role": ...,
"content": ...} (preserving role and content fields) before checking
history[-1].get("role") and before messages.extend(history), so messages only
contains plain dicts expected by the LLM call.
- Around line 500-511: The _httpx_client is created lazily in
conversation_pipeline (symbol: _httpx_client) but never closed, the endpoint is
hardcoded, and response status isn't validated; to fix, initialize
self._httpx_client: httpx.AsyncClient | None = None in __init__, replace the
hardcoded URL with the configured base (use the same config key as
_voice_litellm_base or load from shared/config.py model aliases), add an HTTP
status check after the post and raise/log on non-2xx before parsing, and close
the client in stop() by awaiting _httpx_client.aclose() and setting it to None
to avoid connection pool leaks (keep function names: stop(),
generate_spontaneous_speech, and attribute _voice_litellm_base as references).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 83df2dfe-fc63-462b-a4f6-12a0e5a88187

📥 Commits

Reviewing files that changed from the base of the PR and between f4be83a and dd13cb9.

📒 Files selected for processing (1)
  • agents/hapax_daimonion/conversation_pipeline.py

Comment on lines +485 to +493
history = []
for entry in self._conversation_thread[-10:]:
history.append(entry)
if history and history[-1].get("role") == "user":
history.pop()

messages = [{"role": "system", "content": system_block}]
messages.extend(history)
messages.append({"role": "user", "content": transcript})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Type mismatch: _conversation_thread may contain ThreadEntry objects, not dicts.

The code assumes _conversation_thread entries are dicts with "role" keys, but process_utterance (line 1169) appends ThreadEntry objects. When both paths are used in a session:

  1. history[-1].get("role") raises AttributeError on ThreadEntry
  2. messages.extend(history) adds invalid structures to the LLM request
🔧 Suggested fix: Convert ThreadEntry to message dicts
 history = []
 for entry in self._conversation_thread[-10:]:
-    history.append(entry)
-if history and history[-1].get("role") == "user":
-    history.pop()
+    if isinstance(entry, dict):
+        history.append(entry)
+    elif hasattr(entry, 'user_text') and hasattr(entry, 'response_summary'):
+        # Convert ThreadEntry to message pair
+        history.append({"role": "user", "content": entry.user_text})
+        if entry.response_summary:
+            history.append({"role": "assistant", "content": entry.response_summary})
+if history and history[-1].get("role") == "user":
+    history.pop()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agents/hapax_daimonion/conversation_pipeline.py` around lines 485 - 493, The
slice of self._conversation_thread used to build history may contain ThreadEntry
objects (see ThreadEntry type and process_utterance) but the code treats entries
as dicts; convert each ThreadEntry to the LLM message dict shape before using
history. In the block that builds history (referencing _conversation_thread,
history, messages, system_block, transcript) map/transform ThreadEntry instances
to {"role": ..., "content": ...} (preserving role and content fields) before
checking history[-1].get("role") and before messages.extend(history), so
messages only contains plain dicts expected by the LLM call.

Comment on lines +500 to +511
if not hasattr(self, '_httpx_client') or self._httpx_client is None:
self._httpx_client = httpx.AsyncClient(timeout=15.0)
_llm_start = time.monotonic()
resp = await self._httpx_client.post(
"http://localhost:5000/v1/chat/completions",
json={
"model": "command-r",
"messages": messages,
"max_tokens": 120,
"temperature": 0.7,
},
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

httpx client leaks resources and endpoint is hardcoded.

  1. _httpx_client is created lazily but never closed in stop() — connection pool leak
  2. Endpoint http://localhost:5000 is hardcoded; should use configuration like generate_spontaneous_speech uses _voice_litellm_base (line 409)
  3. No HTTP status check before parsing response — 4xx/5xx errors silently mishandled
🔧 Suggested fix

Declare in __init__:

self._httpx_client: httpx.AsyncClient | None = None

Close in stop():

if self._httpx_client is not None:
    await self._httpx_client.aclose()
    self._httpx_client = None

Use configuration and add error handling:

-resp = await self._httpx_client.post(
-    "http://localhost:5000/v1/chat/completions",
+resp = await self._httpx_client.post(
+    f"{_voice_litellm_base}/v1/chat/completions",
     json={
         "model": "command-r",
         "messages": messages,
         "max_tokens": 120,
         "temperature": 0.7,
     },
 )
+resp.raise_for_status()
 response = resp.json()

As per coding guidelines, **/*.py: Use shared/config.py for model aliases and LiteLLM/Qdrant configuration.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agents/hapax_daimonion/conversation_pipeline.py` around lines 500 - 511, The
_httpx_client is created lazily in conversation_pipeline (symbol: _httpx_client)
but never closed, the endpoint is hardcoded, and response status isn't
validated; to fix, initialize self._httpx_client: httpx.AsyncClient | None =
None in __init__, replace the hardcoded URL with the configured base (use the
same config key as _voice_litellm_base or load from shared/config.py model
aliases), add an HTTP status check after the post and raise/log on non-2xx
before parsing, and close the client in stop() by awaiting
_httpx_client.aclose() and setting it to None to avoid connection pool leaks
(keep function names: stop(), generate_spontaneous_speech, and attribute
_voice_litellm_base as references).

No context excised — full system prompt with policy, environment, goals,
health, nudges, imagination, phenomenal state, grounding directive, and
effort calibration. Runs in thread executor so event loop stays free.
LLMLingua-2 compresses older conversation history at 0.33 rate.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Auto-fix classified CI failure categories as lint. Privileged workflow auto-mutation is disabled; route this through governed remediation.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 26ac5a2056

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +55 to +58
import onnx_asr

log.info("Loading Parakeet model %s on %s...", self._model_name, self._device)
self._model = onnx_asr.load_model(self._model_name)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Add missing onnx_asr runtime dependency

This new STT loader hard-depends on onnx_asr, but the same commit does not add that package to project dependencies/lock. In environments created from pyproject.toml + uv.lock, ResidentSTT.load() will raise ModuleNotFoundError, leaving _model unset and making speech transcription unavailable for the daemon.

Useful? React with 👍 / 👎.

Comment on lines +509 to +512
if self._audio_perception is not None:
asyncio.create_task(
self._audio_perception.process_utterance(
audio_bytes=utterance,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fall back when audio perception cannot process utterances

The new utterance routing sends all captured speech to AudioPerceptionBackend whenever that object exists, but it never checks backend readiness (available()) and never falls back to _process_utterance. If STT fails to load (or is otherwise unavailable), this branch still consumes utterances and the legacy path is skipped, so operator speech is dropped instead of being processed.

Useful? React with 👍 / 👎.

history = []
for entry in self._conversation_thread[-10:]:
history.append(entry)
if history and history[-1].get("role") == "user":
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Normalize thread entries before transcript history checks

process_utterance_from_transcript treats conversation history entries as dicts, but the normal conversation path appends ThreadEntry objects to _conversation_thread. Once any regular turn exists, this .get("role") call raises AttributeError (and messages.extend(history) would also be non-serializable), so transcript-driven operator turns fail before the LLM request.

Useful? React with 👍 / 👎.

# Skip if operator speech was just processed (it preempted)
if operator_imps:
source = getattr(imp, "source", "")
if source in ("exploration", "endogenous.narrative_drive"):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Match exploration source prefixes in preemption filter

The operator-preemption filter only skips impingements whose source is exactly "exploration", but exploration producers emit prefixed sources such as "exploration.boredom" (see agents/reverie/mixer.py). In mixed batches, those exploration impingements still run immediately after operator speech despite the preemption comment, which weakens the intended operator-priority behavior.

Useful? React with 👍 / 👎.

@ryanklee
Copy link
Copy Markdown
Collaborator Author

Governed stale-PR reconciliation note (task 20260531-stale-pr-reconciliation-after-recovery): leaving this open but quarantined/blocked. Current blockers: stale task linkage to a closed/mismatched task, lint failure, unchecked checklist, sensitive daimonion/audio/systemd surface, and local primary worktree dirtiness on this branch. Do not merge or repair opportunistically; revive only under a fresh governed task or close as superseded after the daimonion/audio lane is reconciled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant