Skip to content

feat(voice): core voice pipeline + UI#450

Open
Co-vengers wants to merge 49 commits intoGetBindu:mainfrom
Co-vengers:feat(voice-agent)/code-fixes
Open

feat(voice): core voice pipeline + UI#450
Co-vengers wants to merge 49 commits intoGetBindu:mainfrom
Co-vengers:feat(voice-agent)/code-fixes

Conversation

@Co-vengers
Copy link
Copy Markdown
Contributor

@Co-vengers Co-vengers commented Apr 14, 2026

Summary

Describe the problem and fix in 2–5 bullets:

  • Problem: Bindu had no built-in, real-time voice conversation pipeline (STT↔agent↔TTS) and no UI flow to start/manage voice sessions.
  • Why it matters: Enables low-latency voice interactions (Vapi/Dograh-style) directly in Bindu’s extension model and A2A task system.
  • What changed: Added the voice extension + session lifecycle, voice REST/WS endpoints, and frontend voice panel/client plumbing; added tests and a provider smoke-test script; adjusted CI/pre-commit where required.
  • What did NOT change (scope boundary): No database schema migrations; no telephony/WebRTC support (WebSocket-only); providers are configured via env and can remain disabled.

Change Type (select all that apply)

  • Bug fix
  • Feature
  • Refactor
  • Documentation
  • Security hardening
  • Tests
  • Chore/infra

Scope (select all touched areas)

  • Server / API endpoints
  • Extensions (DID, x402, etc.)
  • Storage backends
  • Scheduler backends
  • Observability / monitoring
  • Authentication / authorization
  • CLI / utilities
  • Tests
  • Documentation
  • CI/CD / infra

Linked Issue/PR

  • Closes #
  • Related #

User-Visible / Behavior Changes

  • New voice endpoints (REST + WebSocket) and a voice extension advertised via capabilities when enabled.
  • New frontend voice call UI and WebSocket voice client flow.
  • New/updated env configuration under VOICE__* (provider-dependent keys).

Security Impact (required)

  • New permissions/capabilities? (Yes/No) Yes
  • Secrets/credentials handling changed? (Yes/No) Yes
  • New/changed network calls? (Yes/No) Yes
  • Database schema/migration changes? (Yes/No) No
  • Authentication/authorization changes? (Yes/No) Yes
  • If any Yes, explain risk + mitigation:
    • Risk: New WS entrypoints and outbound STT/TTS provider calls increase attack surface and operational risk.
    • Mitigation: Provider keys remain server-side via env; optional session token auth; rate limiting; frame size/behavior guards; voice can be disabled via VOICE__ENABLED=false.

Verification

Environment

  • OS:
  • Python version:
  • Storage backend:
  • Scheduler backend:

Steps to Test

  1. Install deps: uv sync --dev --extra agents --extra voice
  2. Set env (at minimum): VOICE__STT_API_KEY=... and TTS provider env (e.g. VOICE__TTS_PROVIDER=piper)
  3. Run example server (from later PR) or a local agent that enables the voice extension; call POST /voice/session/start and connect to WS /ws/voice/{session_id}

Expected Behavior

  • Voice session starts; WS accepts audio frames; server emits state/transcript frames; TTS audio returns when agent responds.

Actual Behavior

Evidence (attach at least one)

  • Failing test before + passing after
  • Test output / logs
  • Screenshot / recording
  • Performance metrics (if relevant)

Human Verification (required)

What you personally verified (not just CI):

  • Verified scenarios:
  • Edge cases checked:
  • What you did NOT verify:

Compatibility / Migration

  • Backward compatible? (Yes/No) Yes (voice is opt-in)
  • Config/env changes? (Yes/No) Yes
  • Database migration needed? (Yes/No) No
  • If yes, exact upgrade steps:
    • Add required VOICE__* env vars only if enabling voice; otherwise keep VOICE__ENABLED=false.

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly: set VOICE__ENABLED=false (and/or revert the PR).
  • Files/config to restore: revert voice-related env changes; revert frontend voice UI if needed.
  • Known bad symptoms reviewers should watch for: WS closes with 1011/1008, session start returns 503 preflight errors, elevated error logs around provider connectivity.

Risks and Mitigations

  • Risk: Provider outages / latency spikes can degrade UX.
    • Mitigation: Provider-specific preflight errors and optional fallback provider; conservative timeouts; ability to disable voice quickly.

Checklist

  • Tests pass (uv run pytest)
  • Pre-commit hooks pass (uv run pre-commit run --all-files)
  • Documentation updated (if needed)
  • Security impact assessed
  • Human verification completed
  • Backward compatibility considered

Summary by CodeRabbit

  • New Features

    • Real-time voice calls: STT/TTS voice sessions over WebSocket, session lifecycle endpoints, in-app call UI (call button, control panel, live transcript) and client-side microphone/playback support.
    • Attach files with chat messages (multi-part messages) and improved upload/download robustness.
  • Bug Fixes

    • More reliable notification delivery and hostname/IP validation; better error reporting for uploads/downloads and voice flows.
    • Fixes to transcript merging, session timeouts, and session lifecycle consistency.
  • Improvements

    • Sensible frontend env defaults, expanded model metadata, stronger session management (memory/Redis backends), and improved scheduler backpressure.
    • Added voice provider smoke-test script and expanded test coverage for voice features.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 14, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a full real-time voice feature: new voice extension, STT/TTS service factories, agent bridge and pipeline, session managers (memory/Redis), WebSocket REST endpoints, frontend UI/client/stores, configuration and dependency updates, tests, and supporting tooling and safety fixes.

Changes

Cohort / File(s) Summary
Voice core & extension
bindu/extensions/voice/__init__.py, bindu/extensions/voice/voice_agent_extension.py, bindu/extensions/__init__.py
New VoiceAgentExtension, docs and re-export; integrates voice config into extension capabilities.
Bridge, pipeline & audio config
bindu/extensions/voice/agent_bridge.py, bindu/extensions/voice/pipeline_builder.py, bindu/extensions/voice/audio_config.py, bindu/extensions/voice/service_factory.py
Added AgentBridgeProcessor (streaming, timeouts, interruptions), pipeline builder wiring STT/TTS/VAD, audio size/config utils, and STT/TTS factory with provider selection/fallback.
Session managers & factory
bindu/extensions/voice/session_manager.py, bindu/extensions/voice/redis_session_manager.py, bindu/extensions/voice/session_factory.py
In-memory and Redis session backends, TTL/cleanup loops, Redis Lua scripts, and factory/close helpers to select and manage backend.
Server endpoints & app integration
bindu/server/endpoints/voice_endpoints.py, bindu/server/applications.py, bindu/server/endpoints/utils.py
New voice REST + WebSocket handlers (rate limiting, auth, control frames), app startup/shutdown session manager lifecycle, and HTTPException passthrough in endpoint utils.
Frontend voice UI & client
frontend/src/lib/components/voice/VoiceCallButton.svelte, .../VoiceCallPanel.svelte, .../LiveTranscript.svelte, frontend/src/lib/services/voice-client.ts, frontend/src/lib/stores/voice.ts
New UI components, VoiceClient WebSocket/audio streaming, AudioContext playback, transcript merging, stores and control APIs for session lifecycle, mute, streaming, and errors.
Chat & input integration
frontend/src/lib/components/chat/ChatInput.svelte, .../ChatWindow.svelte, bindu/penguin/bindufy.py, frontend/src/lib/stores/chat.ts
File-part async handling in chat input; sendMessage now accepts Parts array; voice wiring in bindufy; chat UI integrates voice controls and session state.
Endpoints support & pipeline utilities
bindu/extensions/voice/agent_bridge.py, bindu/extensions/voice/redis_session_manager.py, bindu/extensions/voice/pipeline_builder.py
Complex stateful logic: agent streaming normalization, Redis atomic scripts and cleanup, conditional VAD import and pipeline assembly.
Configuration & deps
bindu/settings.py, frontend/.env.example, pyproject.toml, pytest.ini
Added VoiceSettings and Settings.voice; frontend env defaults for voice; new optional voice deps and dev deps; pytest excludes slow/integration/e2e by default.
Tests & tooling
tests/unit/extensions/voice/*, scripts/test_voice_providers.sh, tests/...
Extensive new unit/integration tests for voice components, OpenTelemetry stubs for tests, and a shell smoke-test script for voice providers and Bindu preflight.
Notifications & security
bindu/utils/notifications.py
Reworked hostname resolution/validation and switched raw-socket delivery to urllib request flow; stricter blocked-range checks to mitigate SSRF/DNS rebinding.
gRPC optionality & guards
bindu/grpc/client.py, bindu/grpc/server.py, bindu/grpc/service.py
Guarded imports for gRPC/protobuf with _require_grpc_dependencies() to fail fast when optional deps are missing.
Worker & messaging changes
bindu/utils/worker/messages.py, bindu/utils/worker/parts.py, bindu/common/protocol/types.py
Nested file shape for file parts, Base64 size checks, multi-encoding text decode fallback; TypedDict changes for FilePart/DataPart (metadata, aliasing).
Server infra & tracing
bindu/server/scheduler/memory_scheduler.py, bindu/server/workers/base.py, bindu/server/metrics.py
Bounded task queue buffer, send timeout/backpressure; span reconstruction for worker tracing; explicit cast in metrics getter.
Frontend misc & types
many frontend files (types, config, server mocks, uploads/downloads) e.g. frontend/src/lib/types/*, frontend/src/lib/server/*, frontend/src/lib/components/*
Type expansions (model metadata), ObjectId import changes, in-memory DB mock improvements, safer stream handling, accessibility and behavior tweaks, and runtime agent base URL propagation.
Pre-commit, gitignore & env
.pre-commit-config.yaml, .gitignore, frontend/.gitignore, frontend/.env.example
pre-commit excludes .agents/, hooks adjusted; gitignore ignores frontend build, voice model artifacts, logs, .kilo, .idea; updated frontend env example with voice keys and defaults.

Sequence Diagram

sequenceDiagram
    participant User as User
    participant FE as Frontend (VoiceClient)
    participant WS as WebSocket
    participant BE as VoiceEndpoint
    participant SM as SessionManager
    participant Bridge as AgentBridge
    participant STT as STT Service
    participant Agent as Agent
    participant TTS as TTS Service

    User->>FE: Start voice session
    FE->>BE: POST /voice/session/start
    BE->>SM: create_session(context_id)
    SM-->>BE: {session_id, token?}
    BE-->>FE: {session_id, ws_url}

    FE->>WS: Connect ws_url (optional token)
    WS->>BE: Accept WebSocket
    BE->>SM: update_state(active)

    loop session active
        User->>FE: Speak (mic audio frames)
        FE->>WS: Send PCM frames
        WS->>BE: Receive audio frames
        BE->>STT: Stream/process frames
        STT-->>BE: Transcript chunk
        BE->>Bridge: process_transcription(text)
        Bridge->>Agent: manifest_run(context)
        Agent-->>Bridge: Response (sync/stream)
        Bridge->>TTS: Request synthesis
        TTS-->>Bridge: Audio chunks
        Bridge-->>BE: Emit transcripts + audio
        BE->>WS: Send agent audio + events
        WS-->>FE: Playback audio, show transcript
    end

    User->>FE: End session
    FE->>WS: Send stop
    WS->>BE: Close
    BE->>SM: end_session(session_id)
    BE-->>FE: Session ended
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Poem

🐰 I hopped into sockets, ears tuned to the stream,
I stitched words to audio and made transcripts gleam,
Sessions keep hopping, transcripts take flight,
From mic to the agent through websockets at night,
A little rabbit cheers: "Now speak and delight!"

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

@Co-vengers Co-vengers mentioned this pull request Apr 14, 2026
8 tasks
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
tests/unit/test_minimax_example.py (1)

192-250: ⚠️ Potential issue | 🟠 Major

Move live API tests out of tests/unit/ (or mark/split as integration).

TestMiniMaxIntegration performs real network calls, which makes this suite non-unit, flaky, and environment-dependent. Please relocate these tests to an integration test path (or gate them with an explicit integration marker) and keep tests/unit/ deterministic.

Suggested refactor sketch
-class TestMiniMaxIntegration:
+@pytest.mark.integration
+class TestMiniMaxIntegration:
-# tests/unit/test_minimax_example.py
+# tests/integration/test_minimax_example_integration.py

As per coding guidelines, "tests/unit/**/*.py: Place unit tests in tests/unit/ directory with pytest and @pytest.mark.asyncio decorator for async tests".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_minimax_example.py` around lines 192 - 250, These tests
(test_minimax_api_connection, test_minimax_chat_completion,
test_minimax_m27_highspeed_model) perform real network calls and must be treated
as integration tests: either move these functions (or the TestMiniMaxIntegration
class containing them) out of the tests/unit suite into your integration test
folder (e.g., tests/integration/) or mark them with an explicit pytest marker
(e.g., decorate each function or the class with `@pytest.mark.integration`) and
add the "integration" marker to pytest.ini; ensure any CI/test runners run
integration tests separately so tests/unit remains deterministic.
frontend/src/lib/utils/agentMessageHandler.ts (2)

363-370: ⚠️ Potential issue | 🟡 Minor

Response body already consumed before fallback text read.

If response.json() throws a parse error at line 364, the response body has already been consumed. The subsequent response.text() at line 367 will fail or return empty because the body stream can only be read once.

🐛 Proposed fix
 		let result;
 		try {
 			result = await response.json();
 		} catch (parseError) {
 			console.error('Failed to parse response JSON:', parseError);
-			const text = await response.text();
-			console.error('Response text:', text);
+			// Note: body already consumed by json() attempt, cannot read text
+			console.error('Response body was not valid JSON');
 			throw new Error('Failed to parse response');
 		}

Alternatively, read as text first and then parse:

const text = await response.text();
let result;
try {
    result = JSON.parse(text);
} catch (parseError) {
    console.error('Failed to parse response JSON:', parseError);
    console.error('Response text:', text);
    throw new Error('Failed to parse response');
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/lib/utils/agentMessageHandler.ts` around lines 363 - 370, The
current try/catch around response.json() in agentMessageHandler.ts consumes the
response body before attempting to read fallback text, causing response.text()
to fail; change the logic to read the body as text first (const text = await
response.text()), then attempt to parse with JSON.parse(text) inside a
try/catch, logging both the parse error and the raw text on failure and
rethrowing a descriptive Error, replacing the existing response.json() +
response.text() sequence and preserving the result variable usage.

11-11: ⚠️ Potential issue | 🟠 Major

Hardcoded localhost URL will break in production deployments.

The AGENT_BASE_URL is hardcoded to http://localhost:3773 on line 11. This same pattern also appears in paymentHandler.ts (line 6). Both files use this URL directly in fetch calls, which will fail when deployed outside localhost.

The codebase already has an established pattern for client-side configuration via PUBLIC_* environment variables (see PublicConfig.svelte.ts and .env.example). The agent URL should follow this same pattern: define PUBLIC_AGENT_BASE_URL as a configurable environment variable instead of hardcoding it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/lib/utils/agentMessageHandler.ts` at line 11, Replace the
hardcoded AGENT_BASE_URL constant with a configurable PUBLIC_AGENT_BASE_URL
environment value: update the AGENT_BASE_URL declaration in
agentMessageHandler.ts to read the client-side public config/env var used across
the app (the same pattern as PublicConfig.svelte.ts), and make the same change
in paymentHandler.ts (replace the literal 'http://localhost:3773' with the
PUBLIC_AGENT_BASE_URL variable). Also ensure .env.example and the public config
export include PUBLIC_AGENT_BASE_URL so the runtime build picks it up.
🟡 Minor comments (8)
tests/integration/grpc/test_grpc_e2e.py-198-200 (1)

198-200: ⚠️ Potential issue | 🟡 Minor

Fail fast on JSON-RPC error payloads instead of polling through timeout.

When tasks/get returns an error, continuing to poll hides the root cause and delays failure diagnostics.

🔧 Proposed patch
-        if payload.get("error"):
-            time.sleep(0.2)
-            continue
+        if payload.get("error"):
+            raise AssertionError(
+                f"tasks/get returned error for {task_id}: {payload['error']}"
+            )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/grpc/test_grpc_e2e.py` around lines 198 - 200, The loop
currently swallows JSON-RPC error payloads by doing "if payload.get('error'):
time.sleep(0.2); continue" — change this to fail fast: when payload.get('error')
is truthy immediately raise or call pytest.fail with the payload/error details
(include the full payload or payload['error'] in the message) instead of
sleeping/continuing, so tests that call the tasks/get RPC surface the root cause
promptly.
tests/integration/grpc/test_grpc_e2e.py-184-186 (1)

184-186: ⚠️ Potential issue | 🟡 Minor

Narrow the blind exception handlers in polling.

Catching Exception here can mask real regressions and keeps Ruff BLE001 warnings active. Prefer scoped exceptions (httpx.HTTPError, ValueError) so unexpected failures surface immediately.

🔧 Proposed patch
-        except Exception:
+        except httpx.HTTPError:
             time.sleep(0.2)
             continue
@@
-        except Exception:
+        except ValueError:
             time.sleep(0.2)
             continue

Also applies to: 194-196

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/grpc/test_grpc_e2e.py` around lines 184 - 186, Replace the
broad "except Exception:" blind handlers in the polling loops in
test_grpc_e2e.py with narrow, specific exceptions so real errors surface; update
the two catch blocks (the one around the polling/wait retry at the snippet and
the similar block at lines ~194-196) to catch only expected errors such as
httpx.HTTPError and ValueError (and/or TimeoutError if the code awaits
network/timeouts), add the necessary import for httpx at the top of the test
file, and leave the existing retry behavior (time.sleep(0.2); continue)
unchanged for these specific exceptions so unknown exceptions will still
propagate and fail the test.
tests/unit/server/scheduler/test_memory_scheduler.py-110-131 (1)

110-131: ⚠️ Potential issue | 🟡 Minor

This test doesn't exercise the capacity path it claims to.

Monkeypatching send() to return immediately bypasses fail_after() and queue saturation entirely, and the payload here doesn't match real TaskSendParams (message, UUID values). Either rename this to a simple "invokes send" test or actually fill the buffer and unblock it from another task.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/server/scheduler/test_memory_scheduler.py` around lines 110 - 131,
The test currently monkeypatches scheduler._write_stream.send to return
immediately and uses a fake payload, so it doesn't exercise capacity/queue
saturation; change it to either rename the test to "invokes send" or rewrite it
to block the send path and use a real TaskSendParams-like payload so run_task
must wait: create an asyncio.Event (or similar) and monkeypatch
scheduler._write_stream.send to await the event before returning (so the first
run_task fills the buffer), then start a second run_task call and assert it
blocks until you set the event, and use real UUIDs/messages matching
TaskSendParams; reference scheduler.run_task and scheduler._write_stream.send
when making the changes.
bindu/penguin/config_validator.py-149-157 (1)

149-157: ⚠️ Potential issue | 🟡 Minor

Use parsed.hostname instead of not parsed.netloc for URL host validation.

The current check accepts malformed URLs: urlparse("http://:80") returns a non-empty netloc of ':80' but hostname=None. This allows invalid deployment URLs to pass validation and fail later. Additionally, line 149 checks if the stripped value is non-empty but parses the unstripped deployment_url, which preserves leading/trailing whitespace in the parsed result.

Fix by stripping before parsing and checking parsed.hostname instead:

deployment_url = deployment_config.get("url")
if not isinstance(deployment_url, str) or not deployment_url.strip():
    raise ConfigError(
        "ConfigError: 'deployment.url' is a required field in the agent configuration."
    )

deployment_url = deployment_url.strip()  # Strip before parsing
parsed = urlparse(deployment_url)
if parsed.scheme not in {"http", "https"} or not parsed.hostname:  # Use hostname
    raise ConfigError(
        f"ConfigError: 'deployment.url' must be a valid http(s) URL, got '{deployment_url}'."
    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/penguin/config_validator.py` around lines 149 - 157, The deployment URL
validation currently parses the raw value and checks parsed.netloc, which
accepts malformed hosts and also leaves surrounding whitespace; update the logic
in the validation around deployment_config.get("url") so you strip the string
before parsing (assign back to deployment_url), then call
urlparse(deployment_url) and validate using parsed.hostname instead of
parsed.netloc, keeping the same ConfigError messages (raise ConfigError in the
same places) so invalid or whitespace-padded URLs like "http://:80" are
rejected.
bindu/extensions/__init__.py-37-41 (1)

37-41: ⚠️ Potential issue | 🟡 Minor

Voice extension docs are too provider-specific.

This text reads as Deepgram+ElevenLabs only, while voice providers are configurable. Consider documenting it as provider-based STT/TTS to avoid stale guidance.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/extensions/__init__.py` around lines 37 - 41, The extension docs
currently call out Deepgram and ElevenLabs specifically in the "Voice (Real-time
Voice Conversations)" section; update the wording to be provider-agnostic by
describing it as configurable STT and TTS providers used via the Pipecat
pipeline, and note that providers can be chosen via the voice config passed to
bindufy() or via VOICE__ENABLED environment settings; keep the mention of
Pipecat, WebSocket, and bidirectional voice conversations but remove
vendor-specific names so the text describes "provider-based STT/TTS" instead of
Deepgram/ElevenLabs.
frontend/src/lib/stores/chat.ts-259-264 (1)

259-264: ⚠️ Potential issue | 🟡 Minor

Local user echo can drop text in multipart messages.

Only the first kind === 'text' part is rendered locally, so additional text parts are lost in UI.

Proposed patch
-    const textPart = parts.find((p) => p.kind === 'text');
-    // Only add message if text part exists and has non-empty text
-    if (textPart?.text) {
-      addMessage(textPart.text, 'user', taskId);
-    }
+    const text = parts
+      .filter((p) => p?.kind === 'text' && typeof p.text === 'string')
+      .map((p) => p.text)
+      .join('\n')
+      .trim();
+    if (text) {
+      addMessage(text, 'user', taskId);
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/lib/stores/chat.ts` around lines 259 - 264, The local user-echo
currently only renders the first text part by using parts.find(...) and
textPart, which drops any additional parts; change this to collect all parts
where p.kind === 'text' (e.g., parts.filter(...)) and concatenate their .text
values (preserving order and skipping empty strings) and pass the combined
string into addMessage(text, 'user', taskId) so all text segments in multipart
messages are shown locally.
frontend/.env.example-91-93 (1)

91-93: ⚠️ Potential issue | 🟡 Minor

Fix formatting: remove spaces around equal signs.

Static analysis correctly flags spaces around = in these lines. .env files should not have spaces around the equal sign as some parsers may include the space in the value.

🔧 Proposed fix
-COOKIE_SAMESITE= # can be "lax", "strict", "none" or left empty
-COOKIE_SECURE= # set to true to only allow cookies over https
-TRUSTED_EMAIL_HEADER= # header to use to get the user email, only use if you know what you are doing
+COOKIE_SAMESITE=# can be "lax", "strict", "none" or left empty
+COOKIE_SECURE=# set to true to only allow cookies over https
+TRUSTED_EMAIL_HEADER=# header to use to get the user email, only use if you know what you are doing
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/.env.example` around lines 91 - 93, The .env example lines have
spaces around the equals which can be interpreted as part of the value; update
the three entries COOKIE_SAMESITE, COOKIE_SECURE, and TRUSTED_EMAIL_HEADER to
remove spaces around the '=' so they read as KEY= (no spaces) to ensure .env
parsers treat values correctly and static analysis stops flagging them.
bindu/extensions/voice/session_factory.py-117-124 (1)

117-124: ⚠️ Potential issue | 🟡 Minor

Remove redundant cleanup after __aenter__ failure—the manager already handles it internally.

The RedisVoiceSessionManager.__aenter__ method catches redis.RedisError, closes the client connection (line 144), and cleans up state before raising ConnectionError. Calling __aexit__ afterward is unnecessary; it will perform no additional cleanup since self._redis_client is already None and there is no cleanup task to cancel.

🛡️ Simplified cleanup
         try:
             await manager.__aenter__()
         except Exception:
+            # __aenter__ handles its own cleanup on failure
             raise
         return manager
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/extensions/voice/session_factory.py` around lines 117 - 124, The
try/except around awaiting manager.__aenter__ performs redundant cleanup by
calling await manager.__aexit__ after __aenter__ raises; remove that extra call
and let RedisVoiceSessionManager.__aenter__ propagate its ConnectionError
because __aenter__ already closes the client and resets state. Update the block
that currently catches Exception around await manager.__aenter__ to simply await
and let exceptions propagate (or re-raise without calling manager.__aexit__),
ensuring references to manager.__aenter__, manager.__aexit__, and the
RedisVoiceSessionManager behavior remain correct.
🧹 Nitpick comments (24)
frontend/src/lib/migrations/lock.ts (1)

18-18: Remove as Partial<Semaphore> cast and use satisfies Semaphore instead.

The object being inserted provides all required fields (_id, key, createdAt, updatedAt, deleteAt) and fully satisfies the Semaphore interface. The Partial cast weakens type safety without benefit. Use satisfies Semaphore to preserve compile-time schema validation while keeping type inference clean.

♻️ Proposed fix
-		const insert = await collections.semaphores.insertOne({
+		const lockDoc = {
 			_id: id,
 			key,
 			createdAt: new Date(),
 			updatedAt: new Date(),
 			deleteAt: new Date(Date.now() + 1000 * 60 * 3), // 3 minutes
-		} as Partial<Semaphore>);
+		} satisfies Semaphore;
+
+		const insert = await collections.semaphores.insertOne(lockDoc);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/lib/migrations/lock.ts` at line 18, Replace the weak cast "as
Partial<Semaphore>" with the stricter "satisfies Semaphore" on the object being
inserted in lock.ts: locate the code that constructs the semaphore object (the
literal containing _id, key, createdAt, updatedAt, deleteAt) and change the type
assertion so the literal satisfies the Semaphore interface (e.g., use "satisfies
Semaphore" instead of "as Partial<Semaphore>") to preserve full compile-time
schema validation and keep type inference intact for the Semaphore type.
frontend/src/lib/server/database.ts (3)

81-85: Redundant filter check after sorting.

The matched array is already filtered on line 67. The loop on lines 81-85 re-applies matchesFilter unnecessarily. You can simplify by returning the first element directly.

♻️ Suggested simplification
 		}
 	}
-	for (const doc of matched) {
-		if (this.matchesFilter(doc, filter)) {
-			return doc;
-		}
-	}
-	return null;
+	return matched[0] ?? null;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/lib/server/database.ts` around lines 81 - 85, The loop that
re-checks each item with this.matchesFilter is redundant because matched is
already filtered; replace the for-loop over matched (using matched,
matchesFilter, filter) with a direct return of the first element (e.g. return
matched[0]) and preserve the original empty-array behavior (return
undefined/null) so the function's semantics don't change.

496-512: Note: Both on and once schedule emissions independently.

When registering handlers via both on and once for the same event (e.g., "data"), each registration triggers its own setTimeout for emission. This could cause duplicate data delivery to multiple handlers. For this in-memory mock used in tests, this is likely acceptable, but worth noting if real-world usage patterns involve multiple listener registrations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/lib/server/database.ts` around lines 496 - 512, The on/once
handlers in downloadStream currently schedule separate setTimeout emissions each
time (functions on and once use addListener and then schedule emit), causing
duplicate deliveries; modify the logic in downloadStream so emission scheduling
is done only once per event (e.g., maintain a Set like scheduledEmits or check
for existing listeners before scheduling) and have on and once only add
listeners (using addListener/wrapped) while relying on a single scheduled emit
per event (use emit(file.data / new Error...) only if not already scheduled) to
prevent duplicate emissions to multiple registrations.

152-159: Minor: Redundant variable assignment.

The const updated = pipelineUpdate; on line 158 is unnecessary since pipelineUpdate already holds the final result.

♻️ Suggested simplification
 			const pipelineUpdate = Array.isArray(update)
 				? update.reduce(
 						(acc, stage) => this.applyUpdate(acc, stage),
 						doc
 				  )
 				: this.applyUpdate(doc, update);
-			const updated = pipelineUpdate;
-			this.data.set(id, updated);
+			this.data.set(id, pipelineUpdate);
 			count++;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/lib/server/database.ts` around lines 152 - 159, Remove the
redundant assignment "const updated = pipelineUpdate;" and use pipelineUpdate
directly when saving to the map; update the block in the method that computes
pipelineUpdate (the Array.isArray(update) ? ... : this.applyUpdate(...) logic)
so that you call this.data.set(id, pipelineUpdate) directly and delete the
unused "updated" binding, ensuring no other references to "updated" remain in
the surrounding code (look for applyUpdate, pipelineUpdate, updated, and
this.data.set).
tests/unit/penguin/test_bindufy.py (1)

186-198: Consider moving this test inside the TestBindufyUtilities class.

The new test validates an important error path. However, it's defined as a standalone function while all other tests in this file are methods of TestBindufyUtilities. Moving it inside the class would maintain consistency.

♻️ Suggested refactor for consistency
-def test_bindufy_non_callable_handler_raises_clear_error():
-    """bindufy should fail fast with a clear handler validation message."""
-    config = {
-        "author": "test@example.com",
-        "name": "Test Agent",
-        "deployment": {"url": "http://localhost:3773"},
-    }
-
-    with pytest.raises(
-        TypeError,
-        match="callable function or coroutine function",
-    ):
-        bindufy(config=config, handler=cast(Any, "not_callable"), run_server=False)
+class TestBindufyUtilities:
+    # ... existing tests ...
+
+    def test_bindufy_non_callable_handler_raises_clear_error(self):
+        """bindufy should fail fast with a clear handler validation message."""
+        config = {
+            "author": "test@example.com",
+            "name": "Test Agent",
+            "deployment": {"url": "http://localhost:3773"},
+        }
+
+        with pytest.raises(
+            TypeError,
+            match="callable function or coroutine function",
+        ):
+            bindufy(config=config, handler=cast(Any, "not_callable"), run_server=False)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/penguin/test_bindufy.py` around lines 186 - 198, Move the
standalone test function test_bindufy_non_callable_handler_raises_clear_error
into the existing TestBindufyUtilities test class to keep test structure
consistent; locate the function and cut it into the TestBindufyUtilities class
body, preserving its name, docstring, config setup and the pytest.raises
assertion that calls bindufy(config=..., handler=cast(Any, "not_callable"),
run_server=False), and ensure indentation and any imports (pytest, cast, Any)
remain correct so the test continues to run as a class method.
frontend/src/lib/components/voice/LiveTranscript.svelte (1)

11-24: Add live-region semantics for real-time transcript updates.

This panel updates dynamically; adding role="log" + aria-live="polite" improves screen-reader usability.

Proposed patch
-<div class="flex max-h-72 flex-col gap-2 overflow-y-auto rounded-lg border border-gray-200 bg-white/80 p-3 text-sm dark:border-gray-700 dark:bg-gray-900/80">
+<div
+	role="log"
+	aria-live="polite"
+	aria-relevant="additions text"
+	class="flex max-h-72 flex-col gap-2 overflow-y-auto rounded-lg border border-gray-200 bg-white/80 p-3 text-sm dark:border-gray-700 dark:bg-gray-900/80"
+>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/lib/components/voice/LiveTranscript.svelte` around lines 11 -
24, The transcript container in the LiveTranscript.svelte component needs
live-region semantics so screen readers get updates: update the top-level div
that renders the transcript items (the element wrapping the {`#if` items.length
=== 0} block) to include role="log" and aria-live="polite" (and optionally
aria-atomic="false") so dynamic transcript entries from the items array are
announced appropriately by assistive tech.
tests/unit/extensions/voice/test_voice_websocket_integration.py (2)

25-74: Extract pipecat stubs to a shared fixture.

The pipecat module stubbing code (lines 25-74) is duplicated almost identically in both tests. Consider extracting this to a shared pytest fixture to reduce duplication and improve maintainability.

♻️ Example fixture extraction
`@pytest.fixture`
def mock_pipecat_modules(monkeypatch):
    """Stub Pipecat modules for WebSocket testing."""
    fastapi_mod = types.ModuleType("pipecat.transports.websocket.fastapi")
    
    class FastAPIWebsocketParams:
        def __init__(self, **kwargs):
            self.kwargs = kwargs
    
    class FastAPIWebsocketTransport:
        def __init__(self, websocket, params):
            self._ws = websocket
            self._params = params
        def input(self):
            return object()
        def output(self):
            return object()
    
    fastapi_mod.FastAPIWebsocketTransport = FastAPIWebsocketTransport
    fastapi_mod.FastAPIWebsocketParams = FastAPIWebsocketParams
    # ... rest of module stubs
    
    monkeypatch.setitem(sys.modules, "pipecat.transports.websocket.fastapi", fastapi_mod)
    # ... rest of setitems
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/extensions/voice/test_voice_websocket_integration.py` around lines
25 - 74, Extract the repeated pipecat stubbing into a single pytest fixture
(e.g., mock_pipecat_modules) that builds the same ModuleType objects and classes
(FastAPIWebsocketParams, FastAPIWebsocketTransport, Pipeline, PipelineTask,
PipelineRunner) and performs the monkeypatch.setitem calls on sys.modules for
"pipecat.transports.websocket.fastapi", "pipecat.pipeline.pipeline",
"pipecat.pipeline.task", and "pipecat.pipeline.runner"; then replace the
duplicated block in both tests with a fixture parameter in the test signature so
tests reuse the fixture instead of redefining the stubs inline.

98-112: Consider using pytest fixtures for app_settings modification.

Directly modifying module.app_settings.voice.* with manual restoration in finally blocks is error-prone and could cause test pollution if tests run in parallel. Consider using monkeypatch.setattr for these settings modifications, which automatically restores values after each test.

♻️ Suggested approach using monkeypatch
-    original_required = module.app_settings.voice.session_auth_required
-    original_timeout = module.app_settings.voice.session_timeout
-    original_enabled = module.app_settings.voice.enabled
-    original_stt = module.app_settings.voice.stt_api_key
-    original_tts = module.app_settings.voice.tts_api_key
-    try:
-        module.app_settings.voice.session_auth_required = True
-        module.app_settings.voice.session_timeout = 300
-        module.app_settings.voice.enabled = True
-        module.app_settings.voice.stt_api_key = (
-            "unit-test-stt-token"  # pragma: allowlist secret
-        )
-        module.app_settings.voice.tts_api_key = (
-            "unit-test-tts-token"  # pragma: allowlist secret
-        )
+    monkeypatch.setattr(module.app_settings.voice, "session_auth_required", True)
+    monkeypatch.setattr(module.app_settings.voice, "session_timeout", 300)
+    monkeypatch.setattr(module.app_settings.voice, "enabled", True)
+    monkeypatch.setattr(module.app_settings.voice, "stt_api_key", "unit-test-stt-token")
+    monkeypatch.setattr(module.app_settings.voice, "tts_api_key", "unit-test-tts-token")

Then remove the finally block for restoration.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/extensions/voice/test_voice_websocket_integration.py` around lines
98 - 112, The test directly mutates module.app_settings.voice attributes
(module.app_settings.voice.session_auth_required, session_timeout, enabled,
stt_api_key, tts_api_key) and restores them in a finally block; replace those
manual mutations with pytest's monkeypatch.setattr calls for each attribute so
values are automatically reverted after the test (use
monkeypatch.setattr(module.app_settings.voice, "session_auth_required", True),
etc.), and remove the finally restoration block.
bindu/utils/worker/messages.py (1)

83-86: Consider handling missing file structure more gracefully.

When part.get("file") returns None or an empty dict, the code falls through to the MIME type check and logs an "Unsupported MIME type" warning with an empty string. Consider adding explicit handling for malformed file parts.

♻️ Suggested improvement
             file_info = part.get("file") or {}
+            if not file_info:
+                logger.warning("File part missing 'file' structure")
+                processed_parts.append({
+                    "kind": "text",
+                    "text": "[System: Malformed file upload - missing file data]",
+                })
+                continue
             mime_type = file_info.get("mimeType", "")
             file_name = file_info.get("name", "uploaded file")
             base64_data = file_info.get("bytes") or file_info.get("data", "")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/utils/worker/messages.py` around lines 83 - 86, The code currently
assumes part.get("file") yields a valid dict and proceeds to
mime_type/file_name/base64_data extraction, which causes vague "Unsupported MIME
type" warnings when the file structure is missing; update the handler around
variables part, file_info, mime_type, file_name, and base64_data to explicitly
detect a missing or malformed file_info (e.g., file_info is None or lacks
bytes/data) and log a clear "Malformed file part" (including file_name or part
identifier) and skip processing/return early before performing the MIME type
check, so subsequent logic only runs when base64_data and mime_type are present
and valid.
bindu/server/applications.py (1)

113-115: Consider moving the import to the top of the file.

The get_voice_extension_from_capabilities import is placed inside __init__ alongside where it's used. While this works for lazy loading, it's inconsistent with get_x402_extension_from_capabilities which is imported at the top (line 40). For consistency, consider moving this import to the module level alongside the x402 utility import.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/server/applications.py` around lines 113 - 115, Move the local import
of get_voice_extension_from_capabilities out of the __init__-time block and add
it to the module-level imports alongside get_x402_extension_from_capabilities so
imports are consistent; update the top of the file to import
get_voice_extension_from_capabilities and remove the inline "from bindu.utils
import get_voice_extension_from_capabilities" before the call that assigns
voice_ext = get_voice_extension_from_capabilities(manifest).
tests/unit/extensions/voice/test_session_manager.py (1)

99-104: Consider adding a test for update_state on non-existent session.

The test verifies update_state works on an existing session, but doesn't test the behavior when called with a non-existent session ID. This edge case should be covered to document expected behavior (silent no-op or exception).

🧪 Suggested additional test
`@pytest.mark.asyncio`
async def test_update_state_nonexistent_session(self, manager):
    # Should not raise (or raise specific exception - document expected behavior)
    await manager.update_state("nonexistent", "active")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/extensions/voice/test_session_manager.py` around lines 99 - 104,
Add a new async pytest test named test_update_state_nonexistent_session that
calls SessionManager.update_state with a non-existent session id (e.g.,
"nonexistent") and asserts the documented behavior: either that it completes
without raising (no-op) or that it raises the specific exception your
implementation uses; also verify via SessionManager.get_session that no session
was created/changed. Reference the manager fixture and the update_state and
get_session methods when adding this test so the edge case is covered and
behavior is explicitly asserted.
bindu/extensions/voice/pipeline_builder.py (1)

39-48: Update docstring to document all callback parameters.

The docstring mentions on_user_transcript and on_agent_response but omits on_state_change and on_agent_transcript which are also accepted parameters.

📝 Suggested docstring update
     Args:
         voice_ext: Voice agent extension with STT/TTS config.
         manifest_run: The agent manifest's ``run`` callable.
         context_id: A2A context ID for this session.
+        on_state_change: Optional callback for pipeline state changes.
         on_user_transcript: Optional callback for user transcript events.
         on_agent_response: Optional callback for agent response events.
+        on_agent_transcript: Optional callback for agent transcript events.
 
     Returns:
         Dictionary with ``stt``, ``tts``, ``bridge``, and ``vad`` components.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/extensions/voice/pipeline_builder.py` around lines 39 - 48, Docstring
for the voice pipeline function in bindu/extensions/voice/pipeline_builder.py
currently documents on_user_transcript and on_agent_response but omits the
on_state_change and on_agent_transcript callback parameters; update the
function's docstring (the block above the pipeline builder function) to list and
briefly describe all four callbacks — on_user_transcript, on_agent_transcript,
on_agent_response, and on_state_change — including their expected signatures and
when they're invoked, while preserving the existing Args and Returns sections
and references to voice_ext, manifest_run, and context_id.
frontend/src/lib/components/chat/ChatInput.svelte (1)

172-182: Consider using a toast notification instead of alert().

Using alert() for error display is blocking and provides a poor user experience. If the codebase has a toast/notification system, consider using that instead for non-blocking error feedback.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/lib/components/chat/ChatInput.svelte` around lines 172 - 182,
Replace the blocking alert fallback in the getFileParts() catch block with the
app’s non-blocking toast/notification API: instead of alert(...), call the
project’s toast function (e.g., showToast/notify/toast) with the same error
message, keep the existing console.error, fileInputEl reset and files = []
logic, and ensure you import or reference the toast helper at the top of
ChatInput.svelte so the notification displays the message `Error reading file:
${err instanceof Error ? err.message : String(err)}` in a non-blocking way.
frontend/src/lib/components/voice/VoiceCallPanel.svelte (2)

66-68: Redundant Float32Array copy.

samples is already a Float32Array from pcm16ToFloat32(), so creating a new Float32Array(samples) is an unnecessary copy.

♻️ Suggested fix
 		const samples = pcm16ToFloat32(pcmBuffer);
 		if (!samples.length) {
 			return;
 		}
 
 		const audioBuffer = ctx.createBuffer(1, samples.length, sampleRate);
-		const channelData = new Float32Array(samples);
-		audioBuffer.copyToChannel(channelData, 0);
+		audioBuffer.copyToChannel(samples, 0);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/lib/components/voice/VoiceCallPanel.svelte` around lines 66 -
68, The code is creating an unnecessary copy with new Float32Array(samples);
update the block that creates audioBuffer and copies channel data to use the
existing Float32Array returned by pcm16ToFloat32() directly: remove the
intermediate channelData variable and call audioBuffer.copyToChannel(samples, 0)
(keeping ctx.createBuffer(...) and the samples variable intact).

146-165: Consider awaiting AudioContext.close() to ensure proper cleanup.

playbackContext.close() returns a Promise. While the current code uses void to suppress the floating promise warning, not awaiting could potentially leave cleanup incomplete if another effect runs before closure finishes.

♻️ Suggested improvement
-	$effect(() => {
+	$effect(() => {
 		if (!["idle", "ended", "error"].includes($voiceState)) {
 			return;
 		}
 
 		playbackNextTime = 0;
 		if (playbackContext) {
-			void playbackContext.close();
+			playbackContext.close().catch(() => {
+				// Ignore close errors during cleanup
+			});
 			playbackContext = null;
 		}

Alternatively, if synchronous cleanup is needed, consider tracking the close promise and preventing new contexts until it resolves.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/lib/components/voice/VoiceCallPanel.svelte` around lines 146 -
165, The effect currently calls playbackContext.close() with void, which
discards the Promise and can leave cleanup incomplete; change it to await the
close call (or capture and await its Promise) before setting playbackContext =
null so the AudioContext fully closes before further actions run, and if needed
add a temporary flag or store the close Promise to prevent creating a new
context while close is pending (refer to $effect, playbackContext, and
playbackContext.close()) while leaving the existing timeout/clearing logic for
fallbackSpeechTimer and lastFallbackSpoken unchanged.
frontend/src/lib/components/chat/ChatWindow.svelte (1)

174-176: Consider adding a comment clarifying the intentional consumption pattern.

The empty for await loop consumes the async generator to completion but discards individual updates. While the existing comment hints at this, a clearer explanation would help future maintainers understand this is intentional for fire-and-forget streaming consumption.

 		try {
 			for await (const _update of sendAgentMessage(message, contextId, { fileParts })) {
-				// Process updates if needed
+				// Intentionally consume generator to completion; individual updates
+				// are handled elsewhere (e.g., via store subscriptions or callbacks).
 			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/lib/components/chat/ChatWindow.svelte` around lines 174 - 176,
The empty for-await loop over sendAgentMessage(message, contextId, { fileParts
}) intentionally consumes the async generator to completion while discarding
per-update values; update the code by replacing the vague comment with a concise
clarifying comment that states this is deliberate "fire-and-forget" consumption
(e.g., to ensure streaming completes and side-effects run) and note that updates
are intentionally ignored so maintainers don't attempt to process yielded values
in sendAgentMessage.
bindu/extensions/voice/service_factory.py (1)

87-103: Broad exception catch in fallback logic may mask unexpected errors.

Catching Exception on line 89 will catch any error including programming bugs (e.g., AttributeError, TypeError). Consider catching more specific exceptions or at least logging the exception type to aid debugging.

♻️ Suggested improvement
     try:
         return _create_tts_service_for_provider(provider, config)
-    except Exception as primary_error:
+    except (ValueError, ImportError) as primary_error:
         if fallback_provider not in {"", "none", provider}:
             logger.warning(
                 "Primary TTS provider failed; attempting fallback",
                 provider=provider,
                 fallback_provider=fallback_provider,
                 error=str(primary_error),
+                error_type=type(primary_error).__name__,
             )

This narrows the fallback trigger to configuration/import issues rather than all exceptions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/extensions/voice/service_factory.py` around lines 87 - 103, The
try/except around _create_tts_service_for_provider(provider, config) is catching
all Exception (primary_error) and may hide programming errors; change it to
catch only expected errors (e.g., ImportError, ConfigurationError, or a specific
TTSSetupError) or at minimum log the exception type before attempting the
fallback; keep the fallback behavior using
_create_tts_service_for_provider(fallback_provider, config) and when catching
fallback_error raise a RuntimeError from fallback_error as before, and ensure
logger.warning includes type(primary_error).__name__ (and/or re-raise
non-expected exceptions) so unexpected exceptions like TypeError/AttributeError
aren’t silently handled.
bindu/extensions/voice/redis_session_manager.py (2)

340-367: Consider using pipeline for get_active_count to reduce round-trips.

The current implementation makes N+1 Redis calls (1 SMEMBERS + N GET calls) which could be slow with many sessions. A Redis pipeline would batch these into fewer round-trips.

♻️ Proposed optimization using pipeline
     async def get_active_count(self) -> int:
         """Return the number of sessions that are not ended."""
         if not self._redis_client:
             return 0

         try:
             members = await self._redis_client.smembers(REDIS_ACTIVE_SET_KEY)
             if not members:
                 return 0

             count = 0
             stale_members: list[str] = []
-            for key in members:
-                data = await self._redis_client.get(key)
-                if data:
-                    session = self._deserialize_session(key.split(":")[-1], data)
-                    if session.state != "ended":
-                        count += 1
-                else:
-                    stale_members.append(key)
+            # Batch GET calls using pipeline
+            async with self._redis_client.pipeline() as pipe:
+                for key in members:
+                    pipe.get(key)
+                results = await pipe.execute()
+
+            for key, data in zip(members, results):
+                if data:
+                    session = self._deserialize_session(key.split(":")[-1], data)
+                    if session.state != "ended":
+                        count += 1
+                else:
+                    stale_members.append(key)

             if stale_members:
                 await self._redis_client.srem(REDIS_ACTIVE_SET_KEY, *stale_members)

             return count
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/extensions/voice/redis_session_manager.py` around lines 340 - 367, The
get_active_count method currently does SMEMBERS then N separate GETs causing N+1
round-trips; change it to batch the GETs with a Redis pipeline on
self._redis_client (use pipeline() or multi/execute variants available) after
retrieving members from REDIS_ACTIVE_SET_KEY, enqueue GET for each member,
execute the pipeline to get all values in one round-trip, then iterate results
to count non-ended sessions via _deserialize_session(session_id, data) and
collect stale_members; if stale_members non-empty, call srem on
REDIS_ACTIVE_SET_KEY (can also be pipelined) to remove them, and preserve
existing RedisError handling and return behavior.

401-441: Same optimization opportunity applies to _expire_timed_out_sessions.

This method also iterates members with individual GET calls. Consider using a pipeline here as well for consistency with any optimization applied to get_active_count.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/extensions/voice/redis_session_manager.py` around lines 401 - 441, The
_expire_timed_out_sessions method currently fetches each session key with
individual GET calls which is inefficient; change it to batch-fetch session data
via a pipeline (or MGET) using the same _redis_client so you first call
smembers(REDIS_ACTIVE_SET_KEY), then pipeline.get or mget for all members,
deserialize each result with _deserialize_session, compute expirations, and
finally perform deletes/evalsha/srem in the pipeline as well (respecting
_delete_session_script_sha) to minimize round-trips and keep logic in the same
function.
bindu/settings.py (1)

1107-1111: Consider using None instead of empty string for redis_url default.

The redis_url defaults to "" (empty string), but session_factory.py checks for truthiness (if not redis_url). While this works, using None would be more explicit and consistent with other optional URL fields like postgres_url in StorageSettings.

♻️ Proposed fix
     # Session storage backend (for multi-worker compatibility)
     session_backend: Literal["memory", "redis"] = "memory"
-    redis_url: str = ""  # e.g., "redis://localhost:6379/0"
+    redis_url: str | None = None  # e.g., "redis://localhost:6379/0"
     redis_session_ttl: int = 300  # seconds, TTL for session keys in Redis
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/settings.py` around lines 1107 - 1111, The redis_url default is an
empty string which relies on truthiness checks in session_factory.py; change the
redis_url declaration in Settings (redis_url: str = "") to use Optional[str]
with a default of None (e.g., redis_url: Optional[str] = None) so it's explicit
and consistent with other optional URL fields like StorageSettings.postgres_url,
and update any type imports if needed; keep the existing truthiness usage in
session_factory.py (if not redis_url) which will continue to work with None.
bindu/extensions/voice/agent_bridge.py (1)

33-39: Consider moving timeout constants to VoiceSettings for consistency.

While these defaults work as fallbacks, they duplicate values that are also in VoiceSettings (agent_timeout_secs, utterance_timeout_secs). The coding guidelines emphasize using app_settings for configuration values.

However, since these are used as constructor defaults when voice_settings=None (e.g., in unit tests), keeping them as module constants is acceptable for testability.

The curly quote in line 38 (') flagged by Ruff is a false positive - this is intentional typography in user-facing text.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/extensions/voice/agent_bridge.py` around lines 33 - 39, Move the
timeout defaults into VoiceSettings and use those settings in the AgentBridge
constructor instead of module-level duplicates: update AgentBridge
(constructor/init) to read agent_timeout_secs and utterance_timeout_secs from
the provided voice_settings (VoiceSettings) and fall back to the existing module
constants MAX_HISTORY_TURNS, DEFAULT_FIRST_TOKEN_TIMEOUT_SECONDS,
DEFAULT_TOTAL_RESPONSE_TIMEOUT_SECONDS when voice_settings is None (to preserve
testability); remove redundant hard-coded timeout values in AgentBridge and
ensure references to DEFAULT_CANCELLATION_GRACE_SECONDS and
DEFAULT_THINKING_TEXT/DEFAULT_TIMEOUT_FALLBACK_TEXT remain as module-level UI
fallbacks, and keep the curly quote in DEFAULT_TIMEOUT_FALLBACK_TEXT unchanged.
frontend/src/lib/services/voice-client.ts (2)

58-61: Unsafe type assertion to access baseUrl.

The double cast (agentAPI as unknown as { baseUrl?: string }) is a code smell indicating the type definitions may be incomplete. Consider either:

  1. Exporting baseUrl properly from agent-api
  2. Adding it to the agentAPI type definition
♻️ Proposed improvement

If agentAPI should expose baseUrl, update its type definition. Otherwise, create a dedicated function:

// In agent-api.ts or a shared config module
export function getAgentBaseUrl(): string {
  return agentAPI.baseUrl ?? "http://localhost:3773";
}

Then use it consistently:

-const baseUrl =
-  (agentAPI as unknown as { baseUrl?: string }).baseUrl || "http://localhost:3773";
+const baseUrl = getAgentBaseUrl();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/lib/services/voice-client.ts` around lines 58 - 61, The double
type-assertion accessing baseUrl in startSession is unsafe; either add baseUrl
to the agentAPI type definition or expose a small accessor from the agent-api
module (e.g., getAgentBaseUrl) that returns agentAPI.baseUrl ??
"http://localhost:3773", then replace the inline cast in startSession with that
accessor (or the correctly typed agentAPI.baseUrl) to remove the (agentAPI as
unknown as { baseUrl?: string }) hack and centralize the default URL logic.

451-458: Remove unused function convertFloat32ToPcm16.

This function is defined at module level but never called anywhere in the codebase. The PCM16 conversion is already handled inside the AudioWorklet processor (lines 349-354).

Proposed fix
-function convertFloat32ToPcm16(input: Float32Array): ArrayBuffer {
-	const pcm = new Int16Array(input.length);
-	for (let index = 0; index < input.length; index += 1) {
-		const sample = Math.max(-1, Math.min(1, input[index] ?? 0));
-		pcm[index] = sample < 0 ? sample * 0x8000 : sample * 0x7fff;
-	}
-	return pcm.buffer;
-}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/lib/services/voice-client.ts` around lines 451 - 458, Remove the
unused module-level function convertFloat32ToPcm16: delete its entire definition
since PCM16 conversion is already performed in the AudioWorklet processor;
ensure there are no remaining references (imports/exports/tests) to
convertFloat32ToPcm16 and run the build/type-check to confirm nothing else
depends on it.
bindu/utils/notifications.py (1)

38-70: Remove unused function _resolve_and_check_ip.

This function is defined but never called anywhere in the codebase. The _resolve_and_validate_destination static method handles the SSRF defense logic and is the one actually invoked by the validation flow.

♻️ Proposed fix
-def _resolve_and_check_ip(hostname: str) -> str:
-    """Resolve hostname to an IP address and verify it is not in a blocked range.
-
-    Returns the resolved IP address string so callers can connect directly to it,
-    preventing a DNS-rebinding attack where a second resolution (inside urlopen) could
-    return a different—potentially internal—address.
-
-    Args:
-        hostname: The hostname to resolve and validate.
-
-    Returns:
-        The resolved IP address as a string.
-
-    Raises:
-        ValueError: If the hostname cannot be resolved or resolves to a blocked range.
-    """
-    try:
-        resolved_ip = str(socket.getaddrinfo(hostname, None)[0][4][0])
-        addr = ipaddress.ip_address(resolved_ip)
-    except (socket.gaierror, ValueError) as exc:
-        raise ValueError(
-            f"Push notification URL hostname could not be resolved: {exc}"
-        ) from exc
-
-    for blocked in _BLOCKED_NETWORKS:
-        if addr in blocked:
-            raise ValueError(
-                f"Push notification URL resolves to a blocked address range "
-                f"({addr} is in {blocked}). Internal addresses are not allowed."
-            )
-
-    return resolved_ip
-
-
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/utils/notifications.py` around lines 38 - 70, The function
_resolve_and_check_ip is unused and duplicates SSRF logic handled by the
_resolve_and_validate_destination static method; remove the
_resolve_and_check_ip function entirely to avoid dead code and potential
confusion, and ensure any other code that might intend to perform hostname
resolution uses _resolve_and_validate_destination instead (search for references
to _resolve_and_check_ip and replace calls with
_resolve_and_validate_destination if any exist).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: edac1545-2eb2-4d66-b8ae-876c272f33d7

📥 Commits

Reviewing files that changed from the base of the PR and between 301624d and bfb9c1e.

⛔ Files ignored due to path filters (4)
  • bindu/grpc/generated/agent_handler_pb2.py is excluded by !**/generated/**
  • bindu/grpc/generated/agent_handler_pb2.pyi is excluded by !**/generated/**
  • bindu/grpc/generated/agent_handler_pb2_grpc.py is excluded by !**/generated/**
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (93)
  • .github/workflows/ci.yml
  • .gitignore
  • .pre-commit-config.yaml
  • bindu/common/protocol/types.py
  • bindu/extensions/__init__.py
  • bindu/extensions/did/did_agent_extension.py
  • bindu/extensions/voice/__init__.py
  • bindu/extensions/voice/agent_bridge.py
  • bindu/extensions/voice/audio_config.py
  • bindu/extensions/voice/pipeline_builder.py
  • bindu/extensions/voice/redis_session_manager.py
  • bindu/extensions/voice/service_factory.py
  • bindu/extensions/voice/session_factory.py
  • bindu/extensions/voice/session_manager.py
  • bindu/extensions/voice/voice_agent_extension.py
  • bindu/penguin/bindufy.py
  • bindu/penguin/config_validator.py
  • bindu/server/applications.py
  • bindu/server/endpoints/utils.py
  • bindu/server/endpoints/voice_endpoints.py
  • bindu/server/handlers/message_handlers.py
  • bindu/server/metrics.py
  • bindu/server/scheduler/memory_scheduler.py
  • bindu/server/workers/base.py
  • bindu/settings.py
  • bindu/utils/__init__.py
  • bindu/utils/capabilities.py
  • bindu/utils/config/settings.py
  • bindu/utils/logging.py
  • bindu/utils/notifications.py
  • bindu/utils/retry.py
  • bindu/utils/task_telemetry.py
  • bindu/utils/worker/messages.py
  • bindu/utils/worker/parts.py
  • frontend/.env.example
  • frontend/.gitignore
  • frontend/src/lib/buildPrompt.ts
  • frontend/src/lib/components/ShareConversationModal.svelte
  • frontend/src/lib/components/chat/ChatInput.svelte
  • frontend/src/lib/components/chat/ChatMessage.svelte
  • frontend/src/lib/components/chat/ChatWindow.svelte
  • frontend/src/lib/components/chat/ContextList.svelte
  • frontend/src/lib/components/voice/LiveTranscript.svelte
  • frontend/src/lib/components/voice/VoiceCallButton.svelte
  • frontend/src/lib/components/voice/VoiceCallPanel.svelte
  • frontend/src/lib/constants/mime.ts
  • frontend/src/lib/jobs/refresh-conversation-stats.ts
  • frontend/src/lib/migrations/lock.ts
  • frontend/src/lib/migrations/routines/02-update-assistants-models.ts
  • frontend/src/lib/migrations/routines/10-update-reports-assistantid.ts
  • frontend/src/lib/server/config.ts
  • frontend/src/lib/server/database.ts
  • frontend/src/lib/server/files/downloadFile.ts
  • frontend/src/lib/server/files/uploadFile.ts
  • frontend/src/lib/server/models.ts
  • frontend/src/lib/services/voice-client.ts
  • frontend/src/lib/stores/chat.ts
  • frontend/src/lib/stores/voice.ts
  • frontend/src/lib/types/ConvSidebar.ts
  • frontend/src/lib/types/Model.ts
  • frontend/src/lib/types/Session.ts
  • frontend/src/lib/utils/agentMessageHandler.ts
  • frontend/src/lib/utils/tree/addChildren.spec.ts
  • frontend/src/lib/utils/tree/addSibling.spec.ts
  • frontend/src/lib/utils/tree/buildSubtree.spec.ts
  • frontend/src/lib/utils/tree/convertLegacyConversation.spec.ts
  • frontend/src/routes/+layout.svelte
  • frontend/src/routes/+page.svelte
  • frontend/src/routes/api/v2/conversations/[id]/message/[messageId]/+server.ts
  • frontend/src/routes/api/v2/export/+server.ts
  • frontend/src/routes/conversation/[id]/+page.svelte
  • frontend/src/routes/conversation/[id]/+server.ts
  • frontend/src/routes/settings/(nav)/authentication/+page.svelte
  • frontend/src/routes/settings/(nav)/negotiation/+page.svelte
  • pyproject.toml
  • pytest.ini
  • scripts/test_voice_providers.sh
  • tests/conftest_stubs.py
  • tests/integration/grpc/test_grpc_e2e.py
  • tests/unit/extensions/__init__.py
  • tests/unit/extensions/voice/__init__.py
  • tests/unit/extensions/voice/test_agent_bridge.py
  • tests/unit/extensions/voice/test_service_factory.py
  • tests/unit/extensions/voice/test_session_manager.py
  • tests/unit/extensions/voice/test_voice_endpoints.py
  • tests/unit/extensions/voice/test_voice_extension.py
  • tests/unit/extensions/voice/test_voice_websocket_integration.py
  • tests/unit/penguin/test_bindufy.py
  • tests/unit/penguin/test_config_validator.py
  • tests/unit/server/scheduler/test_memory_scheduler.py
  • tests/unit/test_minimax_example.py
  • tests/unit/utils/test_notifications.py
  • tests/unit/utils/worker/test_messages.py

Comment thread .github/workflows/ci.yml Outdated
Comment thread bindu/penguin/bindufy.py
Comment on lines +497 to +499
voice_config = validated_config.get("voice")
if voice_config and isinstance(voice_config, dict):
from bindu.extensions.voice import VoiceAgentExtension
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Voice config gate can silently skip initialization.

if voice_config and isinstance(voice_config, dict) skips valid empty dict configs and also ignores non-dict truthy values instead of failing fast.

Proposed patch
-    voice_config = validated_config.get("voice")
-    if voice_config and isinstance(voice_config, dict):
+    voice_config = validated_config.get("voice")
+    if voice_config is not None:
+        if not isinstance(voice_config, dict):
+            raise ValueError("Invalid voice configuration: 'voice' must be an object")
         from bindu.extensions.voice import VoiceAgentExtension

As per coding guidelines: "Validate all external input and use type hints for input validation in Python files".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/penguin/bindufy.py` around lines 497 - 499, The current conditional
around voice_config in bindufy.py silently skips valid empty dicts and ignores
non-dict values instead of failing fast; change the logic to explicitly check
for the presence of the "voice" key in validated_config, assign voice_config =
validated_config["voice"], then validate its type with isinstance(voice_config,
dict) and raise a clear TypeError (or ValueError) if it is not a dict so invalid
types fail fast; only after successful type validation (including allowing an
empty dict) proceed to import/use VoiceAgentExtension.

Comment on lines +16 to +17
class ConfigError(ValueError):
"""Raised when bindufy configuration is invalid."""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

ConfigError is still not the single failure type for config validation.

validate_and_process() now introduces a dedicated exception, but later branches in the same flow still raise bare ValueError (_validate_field_types(), _validate_auth_config(), _validate_hydra_config()). Callers catching ConfigError will still miss a large class of invalid-config failures.

🔧 Suggested fix
-        processed_config = cls._process_complex_fields(processed_config)
-
-        # Validate field types
-        cls._validate_field_types(processed_config)
+        try:
+            processed_config = cls._process_complex_fields(processed_config)
+            cls._validate_field_types(processed_config)
+        except ConfigError:
+            raise
+        except ValueError as exc:
+            raise ConfigError(f"ConfigError: {exc}") from exc

Also applies to: 72-94

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/penguin/config_validator.py` around lines 16 - 17, The code introduces
ConfigError but other validation functions still raise ValueError, so callers
catching ConfigError miss many failures; update the validation flow so all
config validation failures raise ConfigError instead of ValueError by changing
the exception types thrown in _validate_field_types(), _validate_auth_config(),
and _validate_hydra_config() (and any other validation helpers called by
validate_and_process()) to raise ConfigError with descriptive messages; ensure
validate_and_process() continues to either raise or re-raise ConfigError so all
invalid-config paths use the same exception class.

Comment on lines +1 to +977
"""Voice session REST + WebSocket endpoints.

Provides:
POST /voice/session/start → Start a new voice session
DELETE /voice/session/{session_id} → End a voice session
GET /voice/session/{session_id}/status → Get session status
WS /ws/voice/{session_id} → Bidirectional audio stream
"""

from __future__ import annotations

import json
import asyncio
import importlib
import secrets
import time
from dataclasses import dataclass
from typing import TYPE_CHECKING, Any
from uuid import uuid4

from fastapi import HTTPException
from starlette.requests import Request
from starlette.responses import JSONResponse, Response
from starlette.websockets import WebSocket, WebSocketDisconnect, WebSocketState

from bindu.settings import app_settings
from bindu.utils.logging import get_logger
from bindu.server.endpoints.utils import handle_endpoint_errors

if TYPE_CHECKING:
from bindu.server.applications import BinduApplication

logger = get_logger("bindu.server.endpoints.voice")

_BACKGROUND_TASKS: set[asyncio.Task[None]] = set()
_VOICE_RATE_LIMIT_LOCK = asyncio.Lock()
_VOICE_RATE_LIMIT_IP_BUCKET: dict[str, list[float]] = {}
_VOICE_RATE_LIMIT_REDIS_LOCK = asyncio.Lock()
_VOICE_RATE_LIMIT_REDIS_CLIENT: Any | None = None

try:
import redis.asyncio as _redis_async # type: ignore[import-not-found]

_REDIS_AVAILABLE = True
except Exception: # pragma: no cover
_redis_async = None # type: ignore[assignment]
_REDIS_AVAILABLE = False


_RATE_LIMIT_LUA = """
-- Sliding-window rate limit using a sorted set.
-- KEYS[1] = zset key
-- ARGV[1] = now (seconds)
-- ARGV[2] = cutoff (seconds)
-- ARGV[3] = member (unique)
-- ARGV[4] = limit (int)
redis.call('ZREMRANGEBYSCORE', KEYS[1], 0, tonumber(ARGV[2]))
redis.call('ZADD', KEYS[1], tonumber(ARGV[1]), ARGV[3])
local count = redis.call('ZCARD', KEYS[1])
redis.call('EXPIRE', KEYS[1], 120)
if count > tonumber(ARGV[4]) then
return 0
end
return 1
"""


async def _get_rate_limit_redis_client() -> Any | None:
"""Lazy init a Redis client for rate limiting (best-effort)."""
global _VOICE_RATE_LIMIT_REDIS_CLIENT
if _VOICE_RATE_LIMIT_REDIS_CLIENT is not None:
return _VOICE_RATE_LIMIT_REDIS_CLIENT

if not _REDIS_AVAILABLE:
return None

redis_url = app_settings.voice.redis_url
if not redis_url:
return None

async with _VOICE_RATE_LIMIT_REDIS_LOCK:
if _VOICE_RATE_LIMIT_REDIS_CLIENT is not None:
return _VOICE_RATE_LIMIT_REDIS_CLIENT
try:
_VOICE_RATE_LIMIT_REDIS_CLIENT = _redis_async.from_url( # type: ignore[union-attr]
redis_url,
encoding="utf-8",
decode_responses=True,
)
return _VOICE_RATE_LIMIT_REDIS_CLIENT
except Exception:
logger.exception("Failed to initialize Redis rate limiter client")
_VOICE_RATE_LIMIT_REDIS_CLIENT = None
return None


async def _rate_limit_allow_ip(
ip: str,
*,
limit_per_minute: int,
now: float | None = None,
) -> bool:
"""Allow an IP through the sliding-window rate limiter."""
if limit_per_minute <= 0:
return True
t = float(time.time() if now is None else now)
cutoff = t - 60.0

if app_settings.voice.rate_limit_backend == "redis":
client = await _get_rate_limit_redis_client()
if client is not None:
key = f"voice:rate_limit:ip:{ip}"
member = f"{t}:{time.time_ns()}"
try:
allowed = await client.eval(
_RATE_LIMIT_LUA,
1,
key,
t,
cutoff,
member,
int(limit_per_minute),
)
return bool(allowed)
except Exception:
logger.exception("Redis rate limiter failed; falling back to memory")

async with _VOICE_RATE_LIMIT_LOCK:
window = _VOICE_RATE_LIMIT_IP_BUCKET.get(ip, [])
window = [ts for ts in window if ts >= cutoff]

if not window:
_VOICE_RATE_LIMIT_IP_BUCKET.pop(ip, None)

if len(window) >= limit_per_minute:
if window:
_VOICE_RATE_LIMIT_IP_BUCKET[ip] = window
return False
window.append(t)
_VOICE_RATE_LIMIT_IP_BUCKET[ip] = window
return True


@dataclass
class _VoiceControlState:
muted: bool = False
stopped: bool = False
suppress_audio_until: float = 0.0


class _FilteredWebSocket:
"""WebSocket wrapper that filters inbound frames.

Used to keep Pipecat's transport focused on audio frames while this endpoint
consumes and handles JSON control messages (start/mute/unmute/stop/etc).
"""

def __init__(self, websocket: WebSocket, queue: asyncio.Queue[dict[str, Any]]):
self._ws = websocket
self._queue = queue

def __getattr__(self, name: str) -> Any:
return getattr(self._ws, name)

async def receive(self) -> dict[str, Any]:
msg = await self._queue.get()
msg_type = msg.get("type", "unknown")
data_bytes = msg.get("bytes")
logger.info(
f"FilteredWebSocket.receive: type={msg_type}, bytes={len(data_bytes) if data_bytes else 0}"
)
return msg

async def receive_text(self) -> str:
message = await self.receive()
if message.get("type") == "websocket.disconnect":
raise WebSocketDisconnect(code=message.get("code", 1000))
text = message.get("text")
if text is None:
raise RuntimeError("Expected text WebSocket message")
return text

async def receive_bytes(self) -> bytes:
message = await self.receive()
if message.get("type") == "websocket.disconnect":
logger.info("FilteredWebSocket.receive_bytes: disconnect received")
raise WebSocketDisconnect(code=message.get("code", 1000))
data = message.get("bytes")
logger.debug(
f"FilteredWebSocket.receive_bytes: got {len(data) if data else 0} bytes"
)
if data is None:
raise RuntimeError("Expected bytes WebSocket message")
return data


try:
from pipecat.serializers.base_serializer import (
FrameSerializer as _PipecatFrameSerializer,
)
except Exception: # pragma: no cover

class _PipecatFrameSerializer: # type: ignore[too-many-ancestors]
def __init__(self, *args: Any, **kwargs: Any) -> None:
pass


class _RawAudioFrameSerializer(_PipecatFrameSerializer):
"""Serializer for raw PCM audio over WebSocket.

Converts inbound binary frames into Pipecat input audio frames and sends
outbound audio frames as raw bytes for browser playback.
"""

def __init__(self, sample_rate: int, num_channels: int):
super().__init__()
self._sample_rate = sample_rate
self._num_channels = num_channels

async def setup(self, _frame: Any) -> None:
return

async def serialize(self, frame: Any) -> str | bytes | None:
from pipecat.frames.frames import (
OutputAudioRawFrame,
OutputTransportMessageFrame,
OutputTransportMessageUrgentFrame,
)

if isinstance(frame, OutputAudioRawFrame):
return frame.audio

if isinstance(
frame, (OutputTransportMessageFrame, OutputTransportMessageUrgentFrame)
):
return json.dumps(frame.message)

return None

async def deserialize(self, data: str | bytes) -> Any | None:
from pipecat.frames.frames import InputAudioRawFrame, InputTransportMessageFrame

if isinstance(data, (bytes, bytearray)):
return InputAudioRawFrame(
audio=bytes(data),
sample_rate=self._sample_rate,
num_channels=self._num_channels,
)

if isinstance(data, str):
try:
payload = json.loads(data)
except json.JSONDecodeError:
return None
return InputTransportMessageFrame(message=payload)

return None


async def _send_json(
websocket: Any,
payload: dict[str, Any],
send_lock: asyncio.Lock | None = None,
) -> bool:
"""Send a JSON payload over a WebSocket safely.

If a lock is provided, it is used to serialize concurrent send_text calls.
"""
try:
client_state = getattr(websocket, "client_state", None)
application_state = getattr(websocket, "application_state", None)
if (
client_state == WebSocketState.DISCONNECTED
or application_state == WebSocketState.DISCONNECTED
):
return False
except Exception:
# If state inspection fails, still attempt the send below.
pass

data = json.dumps(payload)
try:
if send_lock is None:
await websocket.send_text(data)
return True
async with send_lock:
await websocket.send_text(data)
return True
except (WebSocketDisconnect, RuntimeError):
# RuntimeError is raised by Starlette after a close frame is sent.
return False


def _trim_overlap_text(previous: str, current: str) -> str:
"""Trim repeated word-overlap between adjacent transcript chunks.

Returns only the delta portion of ``current`` that does not duplicate
the suffix of ``previous``.
"""
prev = (previous or "").strip()
curr = (current or "").strip()
if not prev:
return curr
if prev == curr:
return ""

prev_tokens = prev.split()
curr_tokens = curr.split()
max_overlap = min(len(prev_tokens), len(curr_tokens))

for overlap in range(max_overlap, 0, -1):
if prev_tokens[-overlap:] == curr_tokens[:overlap]:
return " ".join(curr_tokens[overlap:]).strip()

return curr


def _extract_bearer_token(value: str | None) -> str | None:
if not value:
return None
parts = value.split()
if len(parts) == 2 and parts[0].lower() == "bearer":
token = parts[1].strip()
return token or None
return None


def _extract_ws_session_token(websocket: WebSocket) -> str | None:
"""Extract session token from WS headers (subprotocol or Authorization)."""
subprotocols = websocket.headers.get("sec-websocket-protocol")
if subprotocols:
# Starlette exposes the raw comma-separated list.
for item in subprotocols.split(","):
token = item.strip()
if token:
return token

return _extract_bearer_token(websocket.headers.get("authorization"))


def _has_ws_subprotocols(websocket: WebSocket) -> bool:
return bool(websocket.headers.get("sec-websocket-protocol"))


def _classify_voice_pipeline_error(exc: Exception) -> tuple[str, int]:
"""Map internal pipeline exceptions to a user-facing error + close code."""
module = type(exc).__module__
name = type(exc).__name__
message = str(exc)

if isinstance(exc, asyncio.TimeoutError) or name == "TimeoutError":
return ("Voice pipeline timed out", 1011)

if module.startswith("websockets") or "ConnectionClosed" in name:
return ("Voice provider connection closed", 1011)

lowered = message.lower()
if "deepgram" in lowered and ("disconnect" in lowered or "closed" in lowered):
return ("Deepgram connection closed", 1011)
if "elevenlabs" in lowered and ("disconnect" in lowered or "closed" in lowered):
return ("ElevenLabs connection closed", 1011)

return ("Voice pipeline error", 1011)


def _voice_preflight_error() -> str | None:
"""Return a user-facing error if voice is not runnable in this process."""
# Ensure optional dependency group is installed.
try:
importlib.import_module("pipecat")
except Exception:
return "Voice dependencies are not installed. Install with: pip install 'bindu[voice]'"

# Ensure provider keys exist (provider-dependent).
if app_settings.voice.stt_provider == "deepgram" and not app_settings.voice.stt_api_key:
return "VOICE__STT_API_KEY is required when VOICE__STT_PROVIDER=deepgram"

tts_provider = app_settings.voice.tts_provider
if tts_provider == "elevenlabs" and not app_settings.voice.tts_api_key:
return "VOICE__TTS_API_KEY is required when VOICE__TTS_PROVIDER=elevenlabs"
if tts_provider == "azure":
if not app_settings.voice.azure_tts_api_key:
return "VOICE__AZURE_TTS_API_KEY is required when VOICE__TTS_PROVIDER=azure"
if not app_settings.voice.azure_tts_region:
return "VOICE__AZURE_TTS_REGION is required when VOICE__TTS_PROVIDER=azure"
if tts_provider not in {"elevenlabs", "piper", "azure"}:
return f"Unsupported VOICE__TTS_PROVIDER={tts_provider!r}"

return None


async def _send_error_and_close(
websocket: WebSocket,
message: str,
*,
send_lock: asyncio.Lock,
close_code: int = 1008,
) -> None:
try:
await _send_json(websocket, {"type": "error", "message": message}, send_lock)
finally:
try:
await websocket.close(code=close_code, reason=message)
except Exception:
pass


async def _voice_control_reader(
websocket: WebSocket,
inbound_queue: asyncio.Queue[dict[str, Any]],
control: _VoiceControlState,
*,
vad_enabled: bool,
send_lock: asyncio.Lock,
on_user_text: Any | None = None,
) -> None:
"""Read from the real WebSocket and push only audio frames to the queue."""
max_binary_frame_bytes = 64 * 1024
max_frames_per_second = 50
max_frames_in_flight = 10

window_started_at = time.monotonic()
window_count = 0

while True:
message: dict[str, Any] = await websocket.receive()
message_type = message.get("type")

if message_type == "websocket.disconnect":
await inbound_queue.put(message)
return

if message_type != "websocket.receive":
await inbound_queue.put(message)
continue

text = message.get("text")
if text is not None:
try:
payload = json.loads(text)
except json.JSONDecodeError:
await _send_error_and_close(
websocket,
"Malformed JSON control frame",
send_lock=send_lock,
)
return

frame_type = payload.get("type")
if frame_type == "mute":
control.muted = True
await _send_json(
websocket, {"type": "state", "state": "muted"}, send_lock
)
continue
if frame_type == "unmute":
control.muted = False
await _send_json(
websocket, {"type": "state", "state": "listening"}, send_lock
)
continue
if frame_type == "stop":
control.stopped = True
await _send_json(
websocket, {"type": "state", "state": "ended"}, send_lock
)
try:
await websocket.close()
finally:
return
if frame_type == "user_text":
user_text = payload.get("text")
if isinstance(user_text, str) and user_text.strip() and on_user_text:
try:
await on_user_text(user_text.strip())
except Exception:
logger.exception("Failed to handle user_text control frame")
continue
if frame_type in {"start", "commit_turn"}:
# If VAD is disabled, the transport/STT may rely on explicit turn boundary
# control frames (e.g. commit_turn). Forward these to the transport.
if not vad_enabled:
await inbound_queue.put(message)
continue

# Unknown control frame: ignore to preserve forward-compat.
continue

data = message.get("bytes")
if data is None:
logger.debug("Voice control reader: no bytes in message, skipping")
continue

logger.info(f"Voice control reader: received audio frame, {len(data)} bytes")

now_monotonic = time.monotonic()
if control.muted or now_monotonic < float(control.suppress_audio_until):
continue

if len(data) > max_binary_frame_bytes:
await _send_error_and_close(
websocket,
f"Audio frame too large (max {max_binary_frame_bytes} bytes)",
send_lock=send_lock,
)
return

if now_monotonic - window_started_at >= 1.0:
window_started_at = now_monotonic
window_count = 0
window_count += 1
if window_count > max_frames_per_second:
await _send_error_and_close(
websocket,
f"Too many audio frames per second (max {max_frames_per_second})",
send_lock=send_lock,
)
return

if inbound_queue.qsize() >= max_frames_in_flight:
await _send_error_and_close(
websocket,
f"Too many audio frames in flight (max {max_frames_in_flight})",
send_lock=send_lock,
)
return

await inbound_queue.put(message)


# ---------------------------------------------------------------------------
# REST Endpoints
# ---------------------------------------------------------------------------


@handle_endpoint_errors("start voice session")
async def voice_session_start(app: BinduApplication, request: Request) -> Response:
"""Start a new voice session.

Request body (optional JSON):
{ "context_id": "<uuid>" }

Returns:
{ "session_id": "...", "ws_url": "ws://host/ws/voice/{session_id}" }
"""
session_manager = getattr(app, "_voice_session_manager", None)
if session_manager is None:
return JSONResponse(
{"error": "Voice extension is not enabled"}, status_code=501
)

if not app_settings.voice.enabled:
return JSONResponse({"error": "Voice is disabled"}, status_code=501)

preflight_error = _voice_preflight_error()
if preflight_error:
return JSONResponse({"error": preflight_error}, status_code=503)

# Per-IP rate limit (best-effort; request.client may be missing in tests/proxies)
client_host = request.client.host if request.client else None
if client_host:
allowed = await _rate_limit_allow_ip(
client_host,
limit_per_minute=int(app_settings.voice.rate_limit_per_ip_per_minute),
)
if not allowed:
return JSONResponse({"error": "Rate limit exceeded"}, status_code=429)

# Parse optional context_id from body
context_id = str(uuid4())
raw_body = await request.body()
if raw_body:
try:
body = json.loads(raw_body)
except json.JSONDecodeError as exc:
raise HTTPException(
status_code=400, detail="Malformed JSON payload"
) from exc

if isinstance(body, dict) and "context_id" in body:
context_id = str(body["context_id"])

session_token: str | None = None
session_token_expires_at: float | None = None
if app_settings.voice.session_auth_required:
session_token = secrets.token_urlsafe(32)
session_token_expires_at = time.time() + max(
1, int(app_settings.voice.session_token_ttl)
)

try:
session = await session_manager.create_session(
context_id,
session_token=session_token,
session_token_expires_at=session_token_expires_at,
)
except RuntimeError as exc:
return JSONResponse({"error": str(exc)}, status_code=429)

# Build WebSocket URL from request
scheme = "wss" if request.url.scheme == "https" else "ws"
# Use hostname from request, fallback to client host, or raise error if unavailable
host = request.url.hostname or (request.client.host if request.client else None)
if not host:
return JSONResponse(
{"error": "Unable to determine host for WebSocket URL"},
status_code=400,
)
ws_url = f"{scheme}://{host}"
if request.url.port:
ws_url += f":{request.url.port}"
ws_url += f"/ws/voice/{session.id}"

return JSONResponse(
{
"session_id": session.id,
"context_id": session.context_id,
"ws_url": ws_url,
**({"session_token": session_token} if session_token else {}),
},
status_code=201,
)


@handle_endpoint_errors("end voice session")
async def voice_session_end(app: BinduApplication, request: Request) -> Response:
"""End a voice session.

Path params:
session_id: The voice session ID

Returns:
{ "status": "ended" }
"""
session_manager = getattr(app, "_voice_session_manager", None)
if session_manager is None:
return JSONResponse(
{"error": "Voice extension is not enabled"}, status_code=501
)

session_id = request.path_params["session_id"]
session = await session_manager.get_session(session_id)
if session is None:
return JSONResponse({"error": "Session not found"}, status_code=404)

await session_manager.end_session(session_id)
return JSONResponse({"status": "ended"})


@handle_endpoint_errors("voice session status")
async def voice_session_status(app: BinduApplication, request: Request) -> Response:
"""Get voice session status.

Path params:
session_id: The voice session ID

Returns:
{ "session_id": "...", "state": "...", "duration": 12.3, "context_id": "..." }
"""
session_manager = getattr(app, "_voice_session_manager", None)
if session_manager is None:
return JSONResponse(
{"error": "Voice extension is not enabled"}, status_code=501
)

session_id = request.path_params["session_id"]
session = await session_manager.get_session(session_id)
if session is None:
return JSONResponse({"error": "Session not found"}, status_code=404)

return JSONResponse(
{
"session_id": session.id,
"context_id": session.context_id,
"state": session.state,
"duration": round(session.duration_seconds, 1),
"task_id": session.task_id,
}
)


# ---------------------------------------------------------------------------
# WebSocket Handler
# ---------------------------------------------------------------------------


async def voice_websocket(websocket: WebSocket) -> None:
"""Bidirectional voice WebSocket handler using Pipecat pipeline."""
app: BinduApplication = websocket.app # type: ignore[assignment]
session_id = websocket.path_params.get("session_id", "")

session_manager = getattr(app, "_voice_session_manager", None)
if session_manager is None:
await websocket.close(code=1008, reason="Voice extension is not enabled")
return
if not app_settings.voice.enabled:
await websocket.close(code=1008, reason="Voice is disabled")
return

preflight_error = _voice_preflight_error()
if preflight_error:
await websocket.close(code=1008, reason=preflight_error)
return

session = await session_manager.get_session(session_id)
if session is None:
await websocket.close(code=1008, reason="Invalid session ID")
return

# Per-IP rate limit on websocket connects (best-effort)
client_host = websocket.client.host if websocket.client else None
if client_host:
allowed = await _rate_limit_allow_ip(
client_host,
limit_per_minute=int(app_settings.voice.rate_limit_per_ip_per_minute),
)
if not allowed:
await websocket.close(code=1008, reason="Rate limit exceeded")
return

if app_settings.voice.session_auth_required:
expected = getattr(session, "session_token", None)
expires_at = getattr(session, "session_token_expires_at", None)
provided = _extract_ws_session_token(websocket)

# If the client sent Sec-WebSocket-Protocol, the server should select one.
# We select the token itself as the negotiated subprotocol.
if provided and _has_ws_subprotocols(websocket):
await websocket.accept(subprotocol=provided)
else:
await websocket.accept()
if not provided:
try:
provided = (await websocket.receive_text()).strip()
except Exception:
await websocket.close(code=1008, reason="Missing session token")
return

if not expected or provided != expected:
await websocket.close(code=1008, reason="Invalid session token")
return
if isinstance(expires_at, (int, float)) and time.time() > float(expires_at):
await websocket.close(code=1008, reason="Session token expired")
return
else:
await websocket.accept()

await session_manager.update_state(session_id, "active")

voice_ext = getattr(app, "_voice_ext", None)
manifest = getattr(app, "manifest", None)
if voice_ext is None or manifest is None or not hasattr(manifest, "run"):
await websocket.send_text(
json.dumps({"type": "error", "message": "Agent not configured for voice"})
)
await websocket.close(code=1011)
return

from bindu.extensions.voice.pipeline_builder import build_voice_pipeline
from pipecat.transports.websocket.fastapi import (
FastAPIWebsocketTransport,
FastAPIWebsocketParams,
)
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.pipeline.runner import PipelineRunner

# Notify UI we are listening
send_lock = asyncio.Lock()
await _send_json(websocket, {"type": "state", "state": "listening"}, send_lock)

async def _on_user_transcript(text: str) -> None:
await _send_json(
websocket,
{"type": "transcript", "role": "user", "text": text, "is_final": True},
send_lock,
)

async def _on_agent_response(text: str) -> None:
control.suppress_audio_until = max(
float(control.suppress_audio_until), time.monotonic() + 0.6
)
await _send_json(
websocket,
{"type": "agent_response", "text": text, "task_id": session.task_id},
send_lock,
)

async def _on_state_change(state: str) -> None:
if state == "agent-speaking":
control.suppress_audio_until = max(
float(control.suppress_audio_until), time.monotonic() + 1.0
)
elif state == "listening":
control.suppress_audio_until = max(
float(control.suppress_audio_until), time.monotonic() + 0.35
)
await _send_json(websocket, {"type": "state", "state": state}, send_lock)

async def _on_agent_transcript(text: str, is_final: bool) -> None:
control.suppress_audio_until = max(
float(control.suppress_audio_until),
time.monotonic() + (0.6 if is_final else 0.9),
)
logger.info(
f"on_agent_transcript: got text='{text[:50]}...' is_final={is_final}"
)
await _send_json(
websocket,
{
"type": "transcript",
"role": "agent",
"text": text,
"is_final": is_final,
},
send_lock,
)

components = build_voice_pipeline(
voice_ext=voice_ext,
manifest_run=manifest.run,
context_id=session.context_id,
on_state_change=_on_state_change,
on_user_transcript=_on_user_transcript,
on_agent_response=_on_agent_response,
on_agent_transcript=_on_agent_transcript,
)

control = _VoiceControlState()
inbound_queue: asyncio.Queue[dict[str, Any]] = asyncio.Queue(maxsize=10)
filtered_ws = _FilteredWebSocket(websocket, inbound_queue)

async def _handle_user_text(text: str) -> None:
await _send_json(
websocket,
{"type": "transcript", "role": "user", "text": text, "is_final": True},
send_lock,
)
response = await components["bridge"].process_transcription(
text, emit_frames=True
)
if response:
await _on_agent_response(response)

reader_task: asyncio.Task[Any] | None = None
runner_task: asyncio.Task[Any] | None = None

try:
transport = FastAPIWebsocketTransport(
websocket=filtered_ws, # type: ignore[arg-type]
params=FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
audio_in_sample_rate=app_settings.voice.sample_rate,
audio_out_sample_rate=app_settings.voice.sample_rate,
add_wav_header=False,
serializer=_RawAudioFrameSerializer(
sample_rate=app_settings.voice.sample_rate,
num_channels=app_settings.voice.audio_channels,
),
),
)

logger.info(
f"Voice pipeline: transport created, sample_rate={app_settings.voice.sample_rate}"
)
logger.info(
f"Voice pipeline: components - STT={type(components['stt']).__name__}, "
f"Bridge={type(components['bridge']).__name__}, TTS={type(components['tts']).__name__}"
)

pipeline_components = [transport.input()]
if components.get("vad"):
pipeline_components.append(components["vad"])
logger.info("Voice pipeline: VAD enabled and added")

pipeline_components.extend(
[
components["stt"],
components["bridge"],
components["tts"],
transport.output(),
]
)
logger.info(
f"Voice pipeline: total components in pipeline: {len(pipeline_components)}"
)

pipeline = Pipeline(pipeline_components)
logger.info("Voice pipeline: Pipeline created successfully")

task = PipelineTask(pipeline)
logger.info("Voice pipeline: PipelineTask created, starting runner...")
runner = PipelineRunner()

runner_task = asyncio.create_task(runner.run(task))

# Start reading control/audio only after pipeline runner is live so
# user_text cannot emit TTS frames before StartFrame initialization.
reader_task = asyncio.create_task(
_voice_control_reader(
websocket,
inbound_queue,
control,
vad_enabled=app_settings.voice.vad_enabled,
send_lock=send_lock,
on_user_text=_handle_user_text,
)
)

async with asyncio.timeout(float(app_settings.voice.session_timeout)):
await runner_task
except WebSocketDisconnect:
logger.info(f"Voice WebSocket disconnected: {session_id}")
except TimeoutError:
logger.info(f"Voice session timed out: {session_id}")
if websocket.client_state == WebSocketState.CONNECTED:
await _send_json(
websocket,
{"type": "error", "message": "Voice session timed out"},
send_lock,
)
except Exception as e:
logger.exception(f"Error in voice WebSocket: {session_id}: {e}")
if websocket.client_state == WebSocketState.CONNECTED:
user_message, close_code = _classify_voice_pipeline_error(e)
await _send_json(
websocket, {"type": "error", "message": user_message}, send_lock
)
try:
await websocket.close(code=close_code, reason=user_message)
except Exception:
pass
finally:
if runner_task and not runner_task.done():
runner_task.cancel()
try:
await runner_task
except asyncio.CancelledError:
pass
except Exception:
logger.exception("Voice pipeline runner task failed")

if reader_task and not reader_task.done():
reader_task.cancel()
try:
await reader_task
except asyncio.CancelledError:
pass
except Exception:
logger.exception("Voice control reader task failed")

if websocket.client_state == WebSocketState.CONNECTED:
try:
await _send_json(
websocket, {"type": "state", "state": "ended"}, send_lock
)
except Exception:
pass

try:
await session_manager.update_state(session_id, "ending")
except Exception:
pass
try:
await components["bridge"].cleanup_background_tasks()
except Exception:
pass
try:
await session_manager.end_session(session_id)
except Exception:
pass
if websocket.client_state == WebSocketState.CONNECTED:
try:
await websocket.close()
except Exception:
pass
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Split this module before it grows further.

At ~1,000 LOC, this file already mixes rate limiting, serializer glue, control parsing, REST handlers, and WebSocket orchestration. That makes lifecycle bugs like the early-return leak below much harder to contain; please extract at least the rate limiter/control reader/REST handlers into separate modules.

As per coding guidelines, "Keep Python files under ~500 lines of code - extract modules when larger".

🧰 Tools
🪛 Ruff (0.15.9)

[warning] 45-45: Do not catch blind exception: Exception

(BLE001)


[warning] 201-201: Do not catch blind exception: Exception

(BLE001)


[error] 277-279: try-except-pass detected, consider logging the exception

(S110)


[warning] 277-277: Do not catch blind exception: Exception

(BLE001)


[warning] 371-371: Do not catch blind exception: Exception

(BLE001)


[error] 404-405: try-except-pass detected, consider logging the exception

(S110)


[warning] 404-404: Do not catch blind exception: Exception

(BLE001)


[warning] 470-470: return inside finally blocks cause exceptions to be silenced

(B012)


[warning] 735-735: Do not catch blind exception: Exception

(BLE001)


[error] 932-933: try-except-pass detected, consider logging the exception

(S110)


[warning] 932-932: Do not catch blind exception: Exception

(BLE001)


[error] 958-959: try-except-pass detected, consider logging the exception

(S110)


[warning] 958-958: Do not catch blind exception: Exception

(BLE001)


[error] 963-964: try-except-pass detected, consider logging the exception

(S110)


[warning] 963-963: Do not catch blind exception: Exception

(BLE001)


[error] 967-968: try-except-pass detected, consider logging the exception

(S110)


[warning] 967-967: Do not catch blind exception: Exception

(BLE001)


[error] 971-972: try-except-pass detected, consider logging the exception

(S110)


[warning] 971-971: Do not catch blind exception: Exception

(BLE001)


[error] 976-977: try-except-pass detected, consider logging the exception

(S110)


[warning] 976-976: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/server/endpoints/voice_endpoints.py` around lines 1 - 977, This file is
too large and mixes unrelated responsibilities; split it into smaller modules by
extracting the rate-limiter logic (functions and globals around
_get_rate_limit_redis_client, _RATE_LIMIT_LUA, _VOICE_RATE_LIMIT_LOCK,
_VOICE_RATE_LIMIT_IP_BUCKET, _VOICE_RATE_LIMIT_REDIS_* and _rate_limit_allow_ip)
into a voice_rate_limiter module, move WebSocket control parsing/queueing and
helper classes (class _VoiceControlState, class _FilteredWebSocket, function
_voice_control_reader, _extract_ws_session_token, _has_ws_subprotocols) into a
voice_control module, and move REST handlers (voice_session_start,
voice_session_end, voice_session_status) into a voice_rest module; keep
serializer and pipeline orchestration (class _RawAudioFrameSerializer,
voice_websocket and pipeline building) in the remaining voice_ws module. Update
all local imports/usages to import the moved symbols (e.g., import
_rate_limit_allow_ip, _voice_control_reader, _FilteredWebSocket,
_RawAudioFrameSerializer, voice_session_start/end/status) from their new modules
and run tests to ensure no circular imports and that session_manager/
app_settings references are preserved. Ensure any module-level globals are
encapsulated or re-exported and adjust tests/import paths accordingly.

Comment thread bindu/server/endpoints/voice_endpoints.py
Comment on lines +274 to +279
onkeydown={(event) => {
if (event.key === "Enter" || event.key === " ") {
event.preventDefault();
isTapped = !isTapped;
}
}}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n frontend/src/lib/components/chat/ChatMessage.svelte | sed -n '250,300p'

Repository: GetBindu/Bindu

Length of output: 1958


🏁 Script executed:

cat -n frontend/src/lib/components/chat/ChatMessage.svelte | sed -n '296,350p'

Repository: GetBindu/Bindu

Length of output: 3023


🏁 Script executed:

find . -name "MarkdownRenderer*" -type f

Repository: GetBindu/Bindu

Length of output: 183


🏁 Script executed:

cat -n ./frontend/src/lib/components/chat/MarkdownRenderer.svelte | head -100

Repository: GetBindu/Bindu

Length of output: 2513


🏁 Script executed:

find . -name "MarkdownBlock.svelte" -type f

Repository: GetBindu/Bindu

Length of output: 113


🏁 Script executed:

cat -n ./frontend/src/lib/components/chat/MarkdownBlock.svelte | head -150

Repository: GetBindu/Bindu

Length of output: 830


Gate the wrapper's keyboard handler to prevent blocking nested control activation.

The wrapper div contains rendered markdown that may include links and buttons. When users press Enter or Space on these nested controls, the event bubbles to this handler, which calls preventDefault() and prevents the nested element from being activated. Check event.target !== event.currentTarget to only handle keyboard events fired directly on the wrapper itself.

Suggested fix
 		onkeydown={(event) => {
+			if (event.target !== event.currentTarget) {
+				return;
+			}
 			if (event.key === "Enter" || event.key === " ") {
 				event.preventDefault();
 				isTapped = !isTapped;
 			}
 		}}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
onkeydown={(event) => {
if (event.key === "Enter" || event.key === " ") {
event.preventDefault();
isTapped = !isTapped;
}
}}
onkeydown={(event) => {
if (event.target !== event.currentTarget) {
return;
}
if (event.key === "Enter" || event.key === " ") {
event.preventDefault();
isTapped = !isTapped;
}
}}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/lib/components/chat/ChatMessage.svelte` around lines 274 - 279,
The wrapper div's onkeydown handler is intercepting Enter/Space for nested
interactive elements; update the handler (the onkeydown on the wrapper that
toggles isTapped) to first check that event.target === event.currentTarget (or
event.target !== event.currentTarget and return early) so it only handles
keyboard events originating on the wrapper itself, and only then call
preventDefault() and toggle isTapped; this ensures nested links/buttons can
receive Enter/Space normally.

Comment thread frontend/src/lib/server/files/uploadFile.ts Outdated
Comment on lines +104 to +105
const startToken = ++startTokenCounter;
const localClient = new VoiceClient();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

endVoiceSession() doesn't cancel an in-flight start.

The token checks at Lines 138-149 can never trip today because only startVoiceSession() increments startTokenCounter, and isStarting blocks a second start. If the user closes the panel while startSession() or connect() is pending, that pending path still installs localClient afterward.

🛠️ Minimal fix
 export async function endVoiceSession(): Promise<void> {
+	startTokenCounter += 1;
+	isStarting = false;
 	const active = client;
 	client = null;

Also applies to: 138-149, 170-189

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/lib/stores/voice.ts` around lines 104 - 105, endVoiceSession()
currently doesn't prevent a pending startVoiceSession() from installing a
localClient because startTokenCounter is only incremented in startVoiceSession;
fix by making startTokenCounter the single source of truth for cancellation:
increment startTokenCounter (or set a new cancel token) inside
endVoiceSession(), capture the current startToken in startVoiceSession() before
any awaits, and after each await (especially after new VoiceClient() and await
localClient.connect()) compare the captured startToken with the global
startTokenCounter; only assign to the module-level localClient and flip
isStarting when the tokens match. Update checks in startVoiceSession (the
sections around startToken, localClient creation and connection) and in
endVoiceSession to use this token comparison so an in-flight start is ignored if
endVoiceSession was called.

Comment thread frontend/src/lib/stores/voice.ts
Comment on lines +129 to 139
fileStream.on("data", (chunk) => {
if (chunk instanceof Uint8Array) {
chunks.push(chunk);
return;
}
reject(new Error("Unexpected chunk type from avatar download stream"));
});
fileStream.on("error", (err) =>
reject(err ?? new Error("Avatar download failed"))
);
fileStream.on("end", () => resolve(Buffer.concat(chunks)));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Stop stream and cleanup listeners when rejecting avatar download

At Line 134, rejection on unexpected chunk type does not stop the stream or remove listeners. The stream can continue emitting after rejection, causing unnecessary work and brittle behavior.

Suggested fix
 								const fileBuffer = await new Promise<Buffer>((resolve, reject) => {
 									const chunks: Uint8Array[] = [];
-									fileStream.on("data", (chunk) => {
-										if (chunk instanceof Uint8Array) {
-											chunks.push(chunk);
-											return;
-										}
-										reject(new Error("Unexpected chunk type from avatar download stream"));
-									});
-									fileStream.on("error", (err) =>
-										reject(err ?? new Error("Avatar download failed"))
-									);
-									fileStream.on("end", () => resolve(Buffer.concat(chunks)));
+									function cleanup() {
+										fileStream.off("data", onData);
+										fileStream.off("error", onError);
+										fileStream.off("end", onEnd);
+									}
+
+									function onData(chunk: unknown) {
+										if (chunk instanceof Uint8Array) {
+											chunks.push(chunk);
+											return;
+										}
+										cleanup();
+										fileStream.destroy();
+										reject(new Error("Unexpected chunk type from avatar download stream"));
+									}
+
+									function onError(err?: Error | null) {
+										cleanup();
+										reject(err ?? new Error("Avatar download failed"));
+									}
+
+									function onEnd() {
+										cleanup();
+										resolve(Buffer.concat(chunks));
+									}
+
+									fileStream.on("data", onData);
+									fileStream.once("error", onError);
+									fileStream.once("end", onEnd);
 								});
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fileStream.on("data", (chunk) => {
if (chunk instanceof Uint8Array) {
chunks.push(chunk);
return;
}
reject(new Error("Unexpected chunk type from avatar download stream"));
});
fileStream.on("error", (err) =>
reject(err ?? new Error("Avatar download failed"))
);
fileStream.on("end", () => resolve(Buffer.concat(chunks)));
const fileBuffer = await new Promise<Buffer>((resolve, reject) => {
const chunks: Uint8Array[] = [];
function cleanup() {
fileStream.off("data", onData);
fileStream.off("error", onError);
fileStream.off("end", onEnd);
}
function onData(chunk: unknown) {
if (chunk instanceof Uint8Array) {
chunks.push(chunk);
return;
}
cleanup();
fileStream.destroy();
reject(new Error("Unexpected chunk type from avatar download stream"));
}
function onError(err?: Error | null) {
cleanup();
reject(err ?? new Error("Avatar download failed"));
}
function onEnd() {
cleanup();
resolve(Buffer.concat(chunks));
}
fileStream.on("data", onData);
fileStream.once("error", onError);
fileStream.once("end", onEnd);
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/routes/api/v2/export/`+server.ts around lines 129 - 139, The
rejection path when encountering an unexpected chunk type in the avatar download
stream must stop the stream and remove listeners to avoid continued emissions;
update the handler around fileStream (the "data"/"error"/"end" listeners and the
chunks array) to perform cleanup before rejecting: create a small cleanup step
that removes the attached listeners (or calls fileStream.removeAllListeners) and
destroys the stream (fileStream.destroy(err)) and invoke that cleanup prior to
calling reject(new Error(...)); apply the same cleanup on the "error" and
"end"/"resolve" paths so all outcomes remove listeners and stop the stream.

@Co-vengers Co-vengers force-pushed the feat(voice-agent)/code-fixes branch from bfb9c1e to bb0e6fe Compare April 15, 2026 17:17
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
bindu/utils/notifications.py (3)

146-152: ⚠️ Potential issue | 🔴 Critical

Finish the resolved_ip refactor before merging.

Line 152 still uses resolved_ip, but validate_config() no longer returns anything. Even if that were restored, Line 226 still calls _post_once() with the old argument list after its signature was reduced. The send_event() path cannot currently succeed.

Also applies to: 154-175, 211-227

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/utils/notifications.py` around lines 146 - 152, The code still passes a
removed/resolved_ip value and calls _post_once with an outdated signature after
validate_config() was changed to return nothing; update send_event and any
callers (send_event, the block using self.validate_config(config) and the
sections that call _post_with_retry and _post_once) to stop expecting or passing
resolved_ip: have validate_config(config) only validate, remove resolved_ip from
the call to _post_with_retry(config["url"], headers, payload, event) and
refactor _post_once(...) call sites to match its new reduced parameter list (use
url, headers, payload, event or whatever the current _post_once signature
requires), and ensure _post_with_retry delegates to the updated _post_once
signature so the send_event path no longer depends on the old resolved_ip
argument.

259-286: ⚠️ Potential issue | 🔴 Critical

Add missing urllib imports and fix HTTPError response handling in _post_once().

Lines 265, 271, and 278 reference request.Request(), request.urlopen(), and error.HTTPError without importing them, causing immediate NameError on any delivery attempt. Additionally, line 282 reads from response which is only defined in the with block; in the except block, use exc.read() instead.

Fix
+from urllib import error, request
 from urllib.parse import urlparse
...
         except error.HTTPError as exc:
             status = exc.code
             body = b""
             try:
-                body = response.read() or b""
+                body = exc.read() or b""
             except OSError:
                 body = b""
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/utils/notifications.py` around lines 259 - 286, Import the missing
urllib names and fix the HTTPError handling in _post_once: add imports for
urllib.request as request and urllib.error as error (or equivalent) so
request.Request and request.urlopen and error.HTTPError are available, and in
the except error.HTTPError block call exc.read() (not response.read()) to get
the error body, decode it and raise NotificationDeliveryError(status, message)
as currently done; ensure this all occurs in the _post_once method that raises
NotificationDeliveryError and that the original with request.urlopen(...) as
response block remains unchanged.

141-145: ⚠️ Potential issue | 🔴 Critical

DNS-rebinding SSRF vulnerability: Code doesn't pass resolved IP to urlopen(), allowing independent DNS lookups.

send_event() docstring (lines 141-145) documents a single-resolution guarantee to prevent DNS-rebinding attacks. However, the implementation fails to enforce this:

  1. Line 152 references an undefined resolved_ip variable
  2. validate_config() doesn't return the resolved IP from _resolve_and_validate_destination()
  3. _post_once() signature (line 259) doesn't have a resolved_ip parameter, despite being called with one at line 226
  4. Line 271 passes the original url to urlopen(), not a resolved IP

Since urlopen() performs its own DNS lookup (not reusing any pre-resolved IP), a hostname can validate as public at line 261 and rebind to a private address during the connection at line 271, directly violating the documented security mechanism.

This is compounded by other issues: duplicate _resolve_and_check_ip() definitions (lines 39-70, 73-104) and unused imports (urllib.request and urllib.error appear to be missing).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/utils/notifications.py` around lines 141 - 145, The code documents
single-resolution DNS to prevent DNS-rebinding but fails to pass the
pre-resolved IP through the call chain: fix by returning the resolved IP from
_resolve_and_validate_destination() into validate_config() and ensure
send_event() passes that resolved_ip into _post_once(); update
_post_once(resolved_ip, ...) signature and use the resolved IP when opening the
HTTP connection (construct the request to use the resolved IP as the network
target and set the original hostname in the Host header so urlopen() does not
perform a new DNS lookup). Also remove the duplicate _resolve_and_check_ip()
definition and correct imports (ensure urllib.request/urllib.error are imported
and unused imports removed). Ensure references to resolved_ip (previously
undefined) are removed or replaced with the returned value from
validate_config()/ _resolve_and_validate_destination().
♻️ Duplicate comments (7)
frontend/src/lib/server/files/uploadFile.ts (1)

18-19: ⚠️ Potential issue | 🔴 Critical

Attach the upload listeners before ending the stream.

upload.end() still runs before the finish/error handlers are registered. If the stream completes quickly, the promise misses the event and hangs until the timeout fires.

Suggested fix
-	upload.write(Buffer.from(buffer));
-	upload.end();
-
 	// only return the filename when upload throws a finish event or a 20s time out occurs
 	return new Promise((resolve, reject) => {
 		const timeoutId = setTimeout(() => {
 			if (typeof upload.off === "function") {
 				upload.off("finish", handleFinish);
@@
 		upload.once("finish", handleFinish);
 		upload.once("error", handleError);
+		upload.write(Buffer.from(buffer));
+		upload.end();
 	});

Also applies to: 49-50

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/lib/server/files/uploadFile.ts` around lines 18 - 19, The upload
stream handlers are being attached after calling upload.end(), which can miss
rapid finish/error events; move the event listener registration
(upload.on('finish', ...), upload.on('error', ...), or Promise resolve/reject
bindings) to before calling upload.write(...) / upload.end() in the upload flow
(look for the upload variable and the surrounding
upload.write(Buffer.from(buffer)); upload.end(); calls and similarly the block
around lines 49-50) so the promise will always observe finish/error events;
ensure you attach listeners first, then write and end the stream.
bindu/server/endpoints/voice_endpoints.py (4)

569-581: ⚠️ Potential issue | 🟠 Major

Reject malformed context_id values before calling create_session().

This currently stringifies arbitrary JSON values. null/objects become bogus context IDs, and blank strings fall through to VoiceSessionManager.create_session(), which raises ValueError for non-empty validation and pushes a client input bug into the generic endpoint error path instead of a clean 400.

🛡️ Minimal validation sketch
-from uuid import uuid4
+from uuid import UUID, uuid4
...
-        if isinstance(body, dict) and "context_id" in body:
-            context_id = str(body["context_id"])
+        if isinstance(body, dict) and "context_id" in body:
+            raw_context_id = body["context_id"]
+            if not isinstance(raw_context_id, str) or not raw_context_id.strip():
+                raise HTTPException(
+                    status_code=400, detail="context_id must be a non-empty UUID string"
+                )
+            try:
+                context_id = str(UUID(raw_context_id))
+            except ValueError as exc:
+                raise HTTPException(
+                    status_code=400, detail="context_id must be a valid UUID"
+                ) from exc

As per coding guidelines, "Validate all external input and use type hints for input validation in Python files".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/server/endpoints/voice_endpoints.py` around lines 569 - 581, The
handler currently stringifies any JSON "context_id" (including null, objects,
empty strings) and passes it to VoiceSessionManager.create_session(), causing
ValueError downstream; update the request parsing to validate and reject invalid
context_id values before calling create_session(): after json.loads(raw_body)
ensure body["context_id"] exists and is a non-empty string (and optionally match
the expected UUID format) and raise HTTPException(status_code=400) for
null/empty/non-string/object values so only a validated context_id is forwarded
to create_session().

748-827: ⚠️ Potential issue | 🔴 Critical

Start the cleanup scope before the session becomes active.

update_state("active"), the agent configuration check, and build_voice_pipeline() all happen before the only try/finally. Any early return/exception there skips end_session() and can leave the session stuck active until it times out or is manually cleaned up.

♻️ Safer structure
-    await session_manager.update_state(session_id, "active")
-    ...
-    components = build_voice_pipeline(...)
-    ...
-    try:
+    components: dict[str, Any] | None = None
+    try:
+        await session_manager.update_state(session_id, "active")
+        ...
+        components = build_voice_pipeline(...)
         ...
     finally:
-        await components["bridge"].cleanup_background_tasks()
+        if components is not None:
+            await components["bridge"].cleanup_background_tasks()
         await session_manager.end_session(session_id)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/server/endpoints/voice_endpoints.py` around lines 748 - 827, The
session is being marked active and the voice pipeline is built before entering
the try/finally that calls end_session, so any early return/exception (e.g.,
agent not configured or build_voice_pipeline failing) can leave the session
active; move session_manager.update_state(session_id, "active") and the call to
build_voice_pipeline(...) inside the try block that contains the
finally/end_session cleanup and perform the agent configuration check (voice_ext
and manifest/run) inside that same try so any exception or return will fall
through to the finally which calls end_session; reference
session_manager.update_state, build_voice_pipeline, manifest.run, voice_ext, and
end_session when making this change.

1-977: 🛠️ Refactor suggestion | 🟠 Major

Split this module before it grows further.

At ~1,000 LOC, this file is already combining rate limiting, REST handlers, websocket control parsing, serializer glue, and pipeline orchestration. That makes lifecycle bugs like the session leak below much harder to isolate, and it's already over the repo's file-size limit.

As per coding guidelines, "Keep Python files under ~500 lines of code - extract modules when larger".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/server/endpoints/voice_endpoints.py` around lines 1 - 977, This file is
too large and should be split: extract rate-limiting logic
(_get_rate_limit_redis_client, _rate_limit_allow_ip, _RATE_LIMIT_LUA, related
globals) into a new voice_rate_limit module; move REST handlers
(voice_session_start, voice_session_end, voice_session_status) into
voice_rest_endpoints that import the rate limiter; move WebSocket orchestration
(voice_websocket, _voice_control_reader, _FilteredWebSocket, _VoiceControlState,
_RawAudioFrameSerializer, _send_json, _extract_ws_session_token,
_has_ws_subprotocols, _classify_voice_pipeline_error, _voice_preflight_error,
_send_error_and_close and helpers like _trim_overlap_text/_extract_bearer_token)
into voice_ws_handler; keep small shared helpers (e.g., _trim_overlap_text,
_extract_bearer_token) in a voice_utils module; ensure each new module exposes
only what others need and update imports in the original package to reference
the new modules so lifecycle calls (create_session, update_state, end_session)
and pipeline building (build_voice_pipeline) remain intact.

721-737: ⚠️ Potential issue | 🟠 Major

Don't block accepted sockets indefinitely on the fallback token read.

When session_auth_required is on and the client didn't send a token in headers, this accepts the socket and then waits forever on receive_text(). The session timeout at Line 911 has not started yet, so an unauthenticated client can pin accepted connections until something else cleans them up.

As per coding guidelines, "Use bindu.settings.app_settings for all configuration - never hardcode values such as URLs, ports, timeouts, API keys, or feature flags".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/server/endpoints/voice_endpoints.py` around lines 721 - 737, The code
accepts a websocket then blocks indefinitely on websocket.receive_text() when no
header token is provided; to fix, use a configurable timeout from app_settings
(e.g., app_settings.voice.ws_token_read_timeout or similar) and wrap the
receive_text call in asyncio.wait_for so unauthenticated clients can't pin open
connections: update the branch in voice_endpoints.py where session_auth_required
is checked (symbols: session_auth_required, _extract_ws_session_token,
websocket.accept, websocket.receive_text) to fetch the timeout from app_settings
(do not hardcode), wrap the await websocket.receive_text() with
asyncio.wait_for(..., timeout=app_settings.voice.<timeout_field>), handle
asyncio.TimeoutError by closing the socket with code 1008 and a clear reason,
and add the necessary asyncio import if missing.
frontend/src/lib/stores/voice.ts (2)

170-189: ⚠️ Potential issue | 🟠 Major

Invalidate pending starts when ending the session.

startTokenCounter only changes in startVoiceSession(). If the user hangs up while localClient.startSession() or connect() is still awaiting, that stale path can still assign client = localClient afterward.

🛑 Minimal cancellation fix
 export async function endVoiceSession(): Promise<void> {
+	startTokenCounter += 1;
+	isStarting = false;
 	const active = client;
 	client = null;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/lib/stores/voice.ts` around lines 170 - 189, endVoiceSession()
must invalidate any in-flight start attempts so a stale localClient from
startVoiceSession() cannot be assigned to the global client after the user hangs
up; increment or change the shared startTokenCounter (the token used in
startVoiceSession() to gate assignment) inside endVoiceSession() before awaiting
stopSession(), and in startVoiceSession() check that the token from when the
async start began still matches startTokenCounter before assigning client =
localClient or calling localClient.startSession()/connect() results; reference
startTokenCounter, startVoiceSession(), endVoiceSession(), client, localClient,
startSession(), and connect() in your change.

107-111: ⚠️ Potential issue | 🟠 Major

Reset mute/audio leftovers before swapping clients.

endVoiceSession() clears isVoiceMuted and latestAgentAudio, but startVoiceSession() does not. Starting a new call after a muted/previous session can leave the UI showing muted while the fresh VoiceClient is live, and the old audio buffer stays around until new audio arrives.

🧹 Minimal reset
 	voiceError.set(null);
 	transcripts.set([]);
 	currentUserTranscript.set("");
 	currentAgentTranscript.set("");
+	isVoiceMuted.set(false);
+	latestAgentAudio.set(null);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/lib/stores/voice.ts` around lines 107 - 111, startVoiceSession()
currently swaps clients without clearing mute/audio leftovers, causing UI to
remain muted and old audio to persist; before or during the client swap in
startVoiceSession(), reset the mute and audio stores the same way
endVoiceSession() does (clear isVoiceMuted and latestAgentAudio, and any related
per-call audio buffers) so the new VoiceClient starts with a clean state —
update startVoiceSession() to call the same reset logic used in
endVoiceSession() (or extract a small helper) to clear isVoiceMuted and
latestAgentAudio when initializing a new session.
🧹 Nitpick comments (5)
.gitignore (1)

226-230: Consider removing redundant patterns.

The root-level patterns en_US-*.onnx and en_US-*.onnx.json will match these files anywhere in the repository, including under examples/voice-agent/. The explicit examples/voice-agent/ patterns on lines 229-230 are redundant and can be removed for cleaner maintenance.

♻️ Proposed cleanup
 # Voice model artifacts downloaded at runtime (Piper)
 en_US-*.onnx
 en_US-*.onnx.json
-examples/voice-agent/en_US-*.onnx
-examples/voice-agent/en_US-*.onnx.json
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.gitignore around lines 226 - 230, The .gitignore contains redundant
patterns: remove the explicit examples/voice-agent/en_US-*.onnx and
examples/voice-agent/en_US-*.onnx.json entries because the root-level patterns
en_US-*.onnx and en_US-*.onnx.json already match those files anywhere; keep the
root patterns (en_US-*.onnx and en_US-*.onnx.json) and delete the two
example-specific lines to clean up maintenance.
frontend/src/lib/stores/chat.ts (1)

229-229: Consider adding a type for the parts parameter.

Using any[] loses type safety. Consider defining or importing a Part type that matches the expected structure (e.g., { kind: 'text' | 'file', ... }).

♻️ Suggested type definition
+type MessagePart = { kind: 'text'; text: string } | { kind: 'file'; file: { bytes: string; mimeType: string; name: string } };
+
-export async function sendMessage(parts: any[]) {
+export async function sendMessage(parts: MessagePart[]) {

Or import from a shared types module if one exists.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/lib/stores/chat.ts` at line 229, The sendMessage function
currently accepts parts: any[] which loses type safety; define or import a Part
interface (or type) and change the signature to parts: Part[] (or readonly
Part[] as appropriate) so callers and the function body get proper typings.
Ensure Part covers the expected shape used in the function (e.g., discriminated
union with kind: 'text' | 'file', text?: string, fileId?: string, metadata?:
Record<string, unknown>, etc.), update any local references inside sendMessage
to use the new properties, and import the shared Part type if one exists instead
of redefining it.
bindu/server/applications.py (1)

153-154: Consider adding type annotation for _voice_session_manager.

The field is initialized to None without a type hint. Adding an annotation would improve IDE support and static analysis:

from typing import TYPE_CHECKING
if TYPE_CHECKING:
    from bindu.extensions.voice.session_factory import SessionManagerBackend

# In __init__:
self._voice_session_manager: SessionManagerBackend | None = None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/server/applications.py` around lines 153 - 154, Add a precise type
annotation for the instance attribute `_voice_session_manager` in the class
__init__ so static checkers/IDEs know its type: import TYPE_CHECKING and
conditionally import the SessionManagerBackend from
bindu.extensions.voice.session_factory, then annotate
`self._voice_session_manager` as `SessionManagerBackend | None` (or
`Optional[SessionManagerBackend]`) instead of an untyped None; leave the runtime
assignment as `None`.
tests/unit/extensions/voice/test_voice_websocket_integration.py (1)

114-119: Test session token is appropriately marked for secret scanning.

The session_token="token-abc" is a test fixture value, not a real credential. The Ruff S106 warning is a false positive. The STT/TTS tokens on lines 108-112 already include # pragma: allowlist secret comments.

Consider adding the same pragma for consistency:

session = await manager.create_session(
    "ctx",
    session_token="token-abc",  # pragma: allowlist secret
    session_token_expires_at=1e12,
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/extensions/voice/test_voice_websocket_integration.py` around lines
114 - 119, The test uses a fake session token literal that triggers a Ruff S106
false positive; update the call to VoiceSessionManager.create_session (the line
assigning session via await manager.create_session(...)) to annotate the
session_token argument with the same secret allowlist pragma used for STT/TTS
tokens (add a trailing comment like "# pragma: allowlist secret" after the
session_token argument) so the linter treats it as an allowed test fixture.
bindu/extensions/voice/agent_bridge.py (1)

490-495: History trimming may drop more messages than necessary for odd-length histories.

When overflow is positive, the logic drops turns_to_drop * 2 messages. However, if the history has an odd number of messages (e.g., user message without assistant response yet), this could over-trim. Consider whether single-message granularity would be more precise.

That said, this is a minor edge case since:

  1. History typically alternates user/assistant pairs
  2. The trim only fires when over the limit
  3. Dropping slightly more is safe (just loses context faster)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/extensions/voice/agent_bridge.py` around lines 490 - 495, _trim_history
currently computes turns_to_drop and removes turns_to_drop * 2 messages which
can exceed overflow for odd-length histories; change it to compute the exact
number of items to remove by using items_to_drop = min(turns_to_drop * 2,
overflow) (or simply remove exactly overflow messages if you prefer
single-message granularity) and then delete
self._conversation_history[:items_to_drop]; keep references to _trim_history,
self._conversation_history, _max_history_messages, overflow and turns_to_drop
when locating and updating the logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bindu/extensions/did/did_agent_extension.py`:
- Around line 261-266: Wrap the ACL hardening subprocess.run call (the block
using subprocess.run and checking result.returncode) in a try/except that on
failure removes the key files that were created earlier (the key file paths
referenced by path and the corresponding public key file variable) before
re-raising the RuntimeError; specifically catch the failure after
subprocess.run, delete both key files (use the same variables used when writing
the files) in the except block, then re-raise the original RuntimeError so the
caller still sees the error.
- Around line 243-247: The code in did_agent_extension.py currently hard-fails
if os.getenv("USERNAME") is missing; update the ACL principal resolution in the
initialization path (the block that sets username) to try multiple fallbacks and
validate non-empty principal before raising: first try os.getenv("USERNAME"),
then os.getlogin() and getpass.getuser(), and if still empty attempt to derive a
principal from environment (e.g., USER or COMPUTERNAME) or fall back to a
clearly named safe default like "unknown_user" only after logging a warning;
ensure you validate the chosen principal is a non-empty string and raise
RuntimeError from the same location only if all fallbacks produce an
empty/invalid principal, and add a brief explanatory log entry referencing the
ACL hardening step.

In `@bindu/extensions/voice/agent_bridge.py`:
- Line 38: The constant DEFAULT_TIMEOUT_FALLBACK_TEXT contains a curly right
single quotation mark; replace it with a straight ASCII apostrophe so the value
reads "Sorry - I'm having trouble responding right now." Update the
DEFAULT_TIMEOUT_FALLBACK_TEXT definition in agent_bridge.py to use the straight
apostrophe (') to avoid rendering/logging issues.

In `@bindu/utils/notifications.py`:
- Around line 39-70: The file contains a shadowed duplicate of the function
_resolve_and_check_ip — the first definition (the one shown) is overwritten by a
second definition later, which makes edits to this copy inert; remove the
redundant/earlier definition so there is only a single canonical
_resolve_and_check_ip implementation (ensure the remaining version correctly
uses socket.getaddrinfo, ipaddress.ip_address, and checks against
_BLOCKED_NETWORKS and preserves the same error messages and behavior).

In `@frontend/src/lib/server/database.ts`:
- Around line 496-512: The on() and once() implementations currently schedule
setTimeout emissions every time a listener is added (in downloadStream), causing
duplicate emits; change the logic so emissions for "error"/"data"/"end" are
scheduled only once per downloadStream lifecycle instead of per registration:
introduce a per-stream flag (e.g., scheduledEmits or scheduled[event]) or move
the scheduling out of on()/once() into the downloadStream
construction/initialization, keep addListener/wrapped removal behavior as-is,
and use the flag to skip scheduling inside both on() and once() (reference the
on, once, downloadStream, addListener, emit, file identifiers).
- Around line 145-159: The updateMany implementation is incorrectly treating
aggregation-pipeline style updates as simple modifier objects (causing literal
"$field" strings and wrong $unset behavior); change updateMany to explicitly
reject array (pipeline) updates instead of applying them: in updateMany (and any
callers like the migration routine), detect Array.isArray(update) and throw a
clear error (or return a rejected Promise) stating pipeline updates are
unsupported; keep existing applyUpdate and matchesFilter logic for non-array
updates and update usages in migrations to use modifier-form updates (or convert
them) rather than passing arrays.

In `@frontend/src/lib/services/voice-client.ts`:
- Around line 244-260: stopSession can race with async audio setup and allow
startAudioStreaming to install new media resources after hangup; add a
guard/teardown token to prevent new installs once stopping. Concretely:
introduce a boolean or cancellation token (e.g., this.isStopping or
this.stopToken) set at the start of stopSession and checked at the start of
startAudioStreaming and any async work that creates mediaStream/nodes; make
cleanupAudioStreaming clear resources and mark the token as stopped; ensure
startAudioStreaming aborts early if the token indicates stopped (also check
this.sessionId null), and apply the same guard pattern to other async audio
setup paths referenced in the file (the async blocks around startAudioStreaming,
mediaStream and worklet setup).

In `@pyproject.toml`:
- Around line 87-97: The gRPC modules (bindu/grpc/client.py,
bindu/grpc/service.py, bindu/grpc/server.py) import grpc at module level which
will raise ImportError for environments without the [grpc] extra; change those
module-level imports to guarded try/except ImportError blocks (or perform lazy
imports inside the functions/classes like GrpcAgentClient, GrpcAgentService,
GrpcAgentServer) and raise a clear, descriptive ImportError instructing the user
to install the "grpc" extra (or fall back gracefully) so importing these modules
without grpc installed does not crash the program unexpectedly.

In `@tests/integration/grpc/test_grpc_e2e.py`:
- Around line 184-186: The task-polling helper uses broad `except Exception`
blocks that hide real errors; replace the first `except Exception` (around the
network retry) with `except httpx.HTTPError` and the second JSON-related `except
Exception` with `except ValueError`, and propagate/record the caught exception
so the eventual timeout error message includes the last exception
instance/details to aid debugging; update the retry loops in the helper so only
those specific exceptions are swallowed and any other exceptions are allowed to
bubble up.

---

Outside diff comments:
In `@bindu/utils/notifications.py`:
- Around line 146-152: The code still passes a removed/resolved_ip value and
calls _post_once with an outdated signature after validate_config() was changed
to return nothing; update send_event and any callers (send_event, the block
using self.validate_config(config) and the sections that call _post_with_retry
and _post_once) to stop expecting or passing resolved_ip: have
validate_config(config) only validate, remove resolved_ip from the call to
_post_with_retry(config["url"], headers, payload, event) and refactor
_post_once(...) call sites to match its new reduced parameter list (use url,
headers, payload, event or whatever the current _post_once signature requires),
and ensure _post_with_retry delegates to the updated _post_once signature so the
send_event path no longer depends on the old resolved_ip argument.
- Around line 259-286: Import the missing urllib names and fix the HTTPError
handling in _post_once: add imports for urllib.request as request and
urllib.error as error (or equivalent) so request.Request and request.urlopen and
error.HTTPError are available, and in the except error.HTTPError block call
exc.read() (not response.read()) to get the error body, decode it and raise
NotificationDeliveryError(status, message) as currently done; ensure this all
occurs in the _post_once method that raises NotificationDeliveryError and that
the original with request.urlopen(...) as response block remains unchanged.
- Around line 141-145: The code documents single-resolution DNS to prevent
DNS-rebinding but fails to pass the pre-resolved IP through the call chain: fix
by returning the resolved IP from _resolve_and_validate_destination() into
validate_config() and ensure send_event() passes that resolved_ip into
_post_once(); update _post_once(resolved_ip, ...) signature and use the resolved
IP when opening the HTTP connection (construct the request to use the resolved
IP as the network target and set the original hostname in the Host header so
urlopen() does not perform a new DNS lookup). Also remove the duplicate
_resolve_and_check_ip() definition and correct imports (ensure
urllib.request/urllib.error are imported and unused imports removed). Ensure
references to resolved_ip (previously undefined) are removed or replaced with
the returned value from validate_config()/ _resolve_and_validate_destination().

---

Duplicate comments:
In `@bindu/server/endpoints/voice_endpoints.py`:
- Around line 569-581: The handler currently stringifies any JSON "context_id"
(including null, objects, empty strings) and passes it to
VoiceSessionManager.create_session(), causing ValueError downstream; update the
request parsing to validate and reject invalid context_id values before calling
create_session(): after json.loads(raw_body) ensure body["context_id"] exists
and is a non-empty string (and optionally match the expected UUID format) and
raise HTTPException(status_code=400) for null/empty/non-string/object values so
only a validated context_id is forwarded to create_session().
- Around line 748-827: The session is being marked active and the voice pipeline
is built before entering the try/finally that calls end_session, so any early
return/exception (e.g., agent not configured or build_voice_pipeline failing)
can leave the session active; move session_manager.update_state(session_id,
"active") and the call to build_voice_pipeline(...) inside the try block that
contains the finally/end_session cleanup and perform the agent configuration
check (voice_ext and manifest/run) inside that same try so any exception or
return will fall through to the finally which calls end_session; reference
session_manager.update_state, build_voice_pipeline, manifest.run, voice_ext, and
end_session when making this change.
- Around line 1-977: This file is too large and should be split: extract
rate-limiting logic (_get_rate_limit_redis_client, _rate_limit_allow_ip,
_RATE_LIMIT_LUA, related globals) into a new voice_rate_limit module; move REST
handlers (voice_session_start, voice_session_end, voice_session_status) into
voice_rest_endpoints that import the rate limiter; move WebSocket orchestration
(voice_websocket, _voice_control_reader, _FilteredWebSocket, _VoiceControlState,
_RawAudioFrameSerializer, _send_json, _extract_ws_session_token,
_has_ws_subprotocols, _classify_voice_pipeline_error, _voice_preflight_error,
_send_error_and_close and helpers like _trim_overlap_text/_extract_bearer_token)
into voice_ws_handler; keep small shared helpers (e.g., _trim_overlap_text,
_extract_bearer_token) in a voice_utils module; ensure each new module exposes
only what others need and update imports in the original package to reference
the new modules so lifecycle calls (create_session, update_state, end_session)
and pipeline building (build_voice_pipeline) remain intact.
- Around line 721-737: The code accepts a websocket then blocks indefinitely on
websocket.receive_text() when no header token is provided; to fix, use a
configurable timeout from app_settings (e.g.,
app_settings.voice.ws_token_read_timeout or similar) and wrap the receive_text
call in asyncio.wait_for so unauthenticated clients can't pin open connections:
update the branch in voice_endpoints.py where session_auth_required is checked
(symbols: session_auth_required, _extract_ws_session_token, websocket.accept,
websocket.receive_text) to fetch the timeout from app_settings (do not
hardcode), wrap the await websocket.receive_text() with asyncio.wait_for(...,
timeout=app_settings.voice.<timeout_field>), handle asyncio.TimeoutError by
closing the socket with code 1008 and a clear reason, and add the necessary
asyncio import if missing.

In `@frontend/src/lib/server/files/uploadFile.ts`:
- Around line 18-19: The upload stream handlers are being attached after calling
upload.end(), which can miss rapid finish/error events; move the event listener
registration (upload.on('finish', ...), upload.on('error', ...), or Promise
resolve/reject bindings) to before calling upload.write(...) / upload.end() in
the upload flow (look for the upload variable and the surrounding
upload.write(Buffer.from(buffer)); upload.end(); calls and similarly the block
around lines 49-50) so the promise will always observe finish/error events;
ensure you attach listeners first, then write and end the stream.

In `@frontend/src/lib/stores/voice.ts`:
- Around line 170-189: endVoiceSession() must invalidate any in-flight start
attempts so a stale localClient from startVoiceSession() cannot be assigned to
the global client after the user hangs up; increment or change the shared
startTokenCounter (the token used in startVoiceSession() to gate assignment)
inside endVoiceSession() before awaiting stopSession(), and in
startVoiceSession() check that the token from when the async start began still
matches startTokenCounter before assigning client = localClient or calling
localClient.startSession()/connect() results; reference startTokenCounter,
startVoiceSession(), endVoiceSession(), client, localClient, startSession(), and
connect() in your change.
- Around line 107-111: startVoiceSession() currently swaps clients without
clearing mute/audio leftovers, causing UI to remain muted and old audio to
persist; before or during the client swap in startVoiceSession(), reset the mute
and audio stores the same way endVoiceSession() does (clear isVoiceMuted and
latestAgentAudio, and any related per-call audio buffers) so the new VoiceClient
starts with a clean state — update startVoiceSession() to call the same reset
logic used in endVoiceSession() (or extract a small helper) to clear
isVoiceMuted and latestAgentAudio when initializing a new session.

---

Nitpick comments:
In @.gitignore:
- Around line 226-230: The .gitignore contains redundant patterns: remove the
explicit examples/voice-agent/en_US-*.onnx and
examples/voice-agent/en_US-*.onnx.json entries because the root-level patterns
en_US-*.onnx and en_US-*.onnx.json already match those files anywhere; keep the
root patterns (en_US-*.onnx and en_US-*.onnx.json) and delete the two
example-specific lines to clean up maintenance.

In `@bindu/extensions/voice/agent_bridge.py`:
- Around line 490-495: _trim_history currently computes turns_to_drop and
removes turns_to_drop * 2 messages which can exceed overflow for odd-length
histories; change it to compute the exact number of items to remove by using
items_to_drop = min(turns_to_drop * 2, overflow) (or simply remove exactly
overflow messages if you prefer single-message granularity) and then delete
self._conversation_history[:items_to_drop]; keep references to _trim_history,
self._conversation_history, _max_history_messages, overflow and turns_to_drop
when locating and updating the logic.

In `@bindu/server/applications.py`:
- Around line 153-154: Add a precise type annotation for the instance attribute
`_voice_session_manager` in the class __init__ so static checkers/IDEs know its
type: import TYPE_CHECKING and conditionally import the SessionManagerBackend
from bindu.extensions.voice.session_factory, then annotate
`self._voice_session_manager` as `SessionManagerBackend | None` (or
`Optional[SessionManagerBackend]`) instead of an untyped None; leave the runtime
assignment as `None`.

In `@frontend/src/lib/stores/chat.ts`:
- Line 229: The sendMessage function currently accepts parts: any[] which loses
type safety; define or import a Part interface (or type) and change the
signature to parts: Part[] (or readonly Part[] as appropriate) so callers and
the function body get proper typings. Ensure Part covers the expected shape used
in the function (e.g., discriminated union with kind: 'text' | 'file', text?:
string, fileId?: string, metadata?: Record<string, unknown>, etc.), update any
local references inside sendMessage to use the new properties, and import the
shared Part type if one exists instead of redefining it.

In `@tests/unit/extensions/voice/test_voice_websocket_integration.py`:
- Around line 114-119: The test uses a fake session token literal that triggers
a Ruff S106 false positive; update the call to
VoiceSessionManager.create_session (the line assigning session via await
manager.create_session(...)) to annotate the session_token argument with the
same secret allowlist pragma used for STT/TTS tokens (add a trailing comment
like "# pragma: allowlist secret" after the session_token argument) so the
linter treats it as an allowed test fixture.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 89f4a10a-a05b-4444-a3b4-1959e9ba9a82

📥 Commits

Reviewing files that changed from the base of the PR and between bfb9c1e and bb0e6fe.

⛔ Files ignored due to path filters (4)
  • bindu/grpc/generated/agent_handler_pb2.py is excluded by !**/generated/**
  • bindu/grpc/generated/agent_handler_pb2.pyi is excluded by !**/generated/**
  • bindu/grpc/generated/agent_handler_pb2_grpc.py is excluded by !**/generated/**
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (87)
  • .github/workflows/ci.yml
  • .gitignore
  • .pre-commit-config.yaml
  • bindu/common/protocol/types.py
  • bindu/extensions/__init__.py
  • bindu/extensions/did/did_agent_extension.py
  • bindu/extensions/voice/__init__.py
  • bindu/extensions/voice/agent_bridge.py
  • bindu/extensions/voice/audio_config.py
  • bindu/extensions/voice/pipeline_builder.py
  • bindu/extensions/voice/redis_session_manager.py
  • bindu/extensions/voice/service_factory.py
  • bindu/extensions/voice/session_factory.py
  • bindu/extensions/voice/session_manager.py
  • bindu/extensions/voice/voice_agent_extension.py
  • bindu/penguin/bindufy.py
  • bindu/server/applications.py
  • bindu/server/endpoints/utils.py
  • bindu/server/endpoints/voice_endpoints.py
  • bindu/server/metrics.py
  • bindu/server/scheduler/memory_scheduler.py
  • bindu/server/workers/base.py
  • bindu/settings.py
  • bindu/utils/__init__.py
  • bindu/utils/capabilities.py
  • bindu/utils/logging.py
  • bindu/utils/notifications.py
  • bindu/utils/retry.py
  • bindu/utils/worker/messages.py
  • bindu/utils/worker/parts.py
  • frontend/.env.example
  • frontend/.gitignore
  • frontend/src/lib/buildPrompt.ts
  • frontend/src/lib/components/ShareConversationModal.svelte
  • frontend/src/lib/components/chat/ChatInput.svelte
  • frontend/src/lib/components/chat/ChatMessage.svelte
  • frontend/src/lib/components/chat/ChatWindow.svelte
  • frontend/src/lib/components/voice/LiveTranscript.svelte
  • frontend/src/lib/components/voice/VoiceCallButton.svelte
  • frontend/src/lib/components/voice/VoiceCallPanel.svelte
  • frontend/src/lib/constants/mime.ts
  • frontend/src/lib/jobs/refresh-conversation-stats.ts
  • frontend/src/lib/migrations/lock.ts
  • frontend/src/lib/migrations/routines/02-update-assistants-models.ts
  • frontend/src/lib/migrations/routines/10-update-reports-assistantid.ts
  • frontend/src/lib/server/config.ts
  • frontend/src/lib/server/database.ts
  • frontend/src/lib/server/files/downloadFile.ts
  • frontend/src/lib/server/files/uploadFile.ts
  • frontend/src/lib/server/models.ts
  • frontend/src/lib/services/voice-client.ts
  • frontend/src/lib/stores/chat.ts
  • frontend/src/lib/stores/voice.ts
  • frontend/src/lib/types/ConvSidebar.ts
  • frontend/src/lib/types/Model.ts
  • frontend/src/lib/types/Session.ts
  • frontend/src/lib/utils/agentMessageHandler.ts
  • frontend/src/lib/utils/tree/addChildren.spec.ts
  • frontend/src/lib/utils/tree/addSibling.spec.ts
  • frontend/src/lib/utils/tree/buildSubtree.spec.ts
  • frontend/src/lib/utils/tree/convertLegacyConversation.spec.ts
  • frontend/src/routes/+layout.svelte
  • frontend/src/routes/+page.svelte
  • frontend/src/routes/api/v2/conversations/[id]/message/[messageId]/+server.ts
  • frontend/src/routes/api/v2/export/+server.ts
  • frontend/src/routes/conversation/[id]/+page.svelte
  • frontend/src/routes/conversation/[id]/+server.ts
  • frontend/src/routes/settings/(nav)/authentication/+page.svelte
  • frontend/src/routes/settings/(nav)/negotiation/+page.svelte
  • pyproject.toml
  • pytest.ini
  • scripts/test_voice_providers.sh
  • tests/conftest_stubs.py
  • tests/integration/grpc/test_grpc_e2e.py
  • tests/unit/extensions/__init__.py
  • tests/unit/extensions/voice/__init__.py
  • tests/unit/extensions/voice/test_agent_bridge.py
  • tests/unit/extensions/voice/test_service_factory.py
  • tests/unit/extensions/voice/test_session_manager.py
  • tests/unit/extensions/voice/test_voice_endpoints.py
  • tests/unit/extensions/voice/test_voice_extension.py
  • tests/unit/extensions/voice/test_voice_websocket_integration.py
  • tests/unit/penguin/test_bindufy.py
  • tests/unit/penguin/test_config_validator.py
  • tests/unit/server/scheduler/test_memory_scheduler.py
  • tests/unit/utils/test_notifications.py
  • tests/unit/utils/worker/test_messages.py
✅ Files skipped from review due to trivial changes (26)
  • frontend/src/lib/migrations/routines/10-update-reports-assistantid.ts
  • tests/unit/extensions/voice/init.py
  • frontend/src/lib/buildPrompt.ts
  • tests/unit/extensions/init.py
  • bindu/utils/logging.py
  • pytest.ini
  • frontend/src/routes/api/v2/conversations/[id]/message/[messageId]/+server.ts
  • bindu/server/endpoints/utils.py
  • frontend/src/lib/utils/tree/addChildren.spec.ts
  • frontend/.gitignore
  • frontend/src/routes/settings/(nav)/negotiation/+page.svelte
  • tests/unit/penguin/test_config_validator.py
  • bindu/utils/init.py
  • frontend/src/routes/settings/(nav)/authentication/+page.svelte
  • tests/unit/penguin/test_bindufy.py
  • frontend/src/lib/migrations/routines/02-update-assistants-models.ts
  • frontend/src/lib/constants/mime.ts
  • frontend/src/lib/server/config.ts
  • bindu/extensions/voice/init.py
  • frontend/src/lib/components/voice/LiveTranscript.svelte
  • tests/unit/extensions/voice/test_session_manager.py
  • frontend/src/lib/components/voice/VoiceCallButton.svelte
  • tests/unit/extensions/voice/test_service_factory.py
  • bindu/extensions/voice/voice_agent_extension.py
  • bindu/extensions/voice/redis_session_manager.py
  • bindu/settings.py
🚧 Files skipped from review as they are similar to previous changes (33)
  • frontend/src/lib/components/ShareConversationModal.svelte
  • bindu/utils/worker/parts.py
  • frontend/src/routes/+page.svelte
  • bindu/utils/retry.py
  • frontend/src/lib/migrations/lock.ts
  • tests/unit/server/scheduler/test_memory_scheduler.py
  • frontend/src/lib/utils/tree/buildSubtree.spec.ts
  • frontend/src/routes/conversation/[id]/+page.svelte
  • frontend/src/lib/types/ConvSidebar.ts
  • frontend/src/lib/utils/tree/addSibling.spec.ts
  • bindu/extensions/init.py
  • frontend/src/routes/api/v2/export/+server.ts
  • frontend/src/lib/utils/tree/convertLegacyConversation.spec.ts
  • frontend/src/lib/types/Session.ts
  • frontend/src/lib/types/Model.ts
  • frontend/src/routes/conversation/[id]/+server.ts
  • frontend/src/lib/server/files/downloadFile.ts
  • frontend/src/lib/jobs/refresh-conversation-stats.ts
  • bindu/server/metrics.py
  • tests/unit/utils/worker/test_messages.py
  • bindu/server/workers/base.py
  • .pre-commit-config.yaml
  • tests/conftest_stubs.py
  • frontend/src/lib/server/models.ts
  • tests/unit/extensions/voice/test_voice_extension.py
  • .github/workflows/ci.yml
  • frontend/src/routes/+layout.svelte
  • frontend/src/lib/components/chat/ChatMessage.svelte
  • bindu/server/scheduler/memory_scheduler.py
  • frontend/src/lib/utils/agentMessageHandler.ts
  • tests/unit/extensions/voice/test_agent_bridge.py
  • bindu/extensions/voice/pipeline_builder.py
  • bindu/extensions/voice/audio_config.py

Comment thread bindu/extensions/did/did_agent_extension.py Outdated
Comment thread bindu/extensions/did/did_agent_extension.py
Comment thread bindu/extensions/voice/agent_bridge.py
Comment thread bindu/utils/notifications.py Outdated
Comment thread frontend/src/lib/server/database.ts
Comment thread frontend/src/lib/server/database.ts
Comment thread frontend/src/lib/services/voice-client.ts
Comment thread pyproject.toml
Comment thread tests/integration/grpc/test_grpc_e2e.py Outdated
@chandan-1427
Copy link
Copy Markdown
Contributor

hey @Co-vengers, the Unit Tests (3.12) are failing after 7 seconds and check the code rabbit reviews

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
frontend/src/lib/components/chat/ChatWindow.svelte (1)

294-301: ⚠️ Potential issue | 🟡 Minor

Keep the MIME fallback consistent for dropped/pasted files.

file.type can be empty, and sendAgentMessage() drops file parts with blank MIME. This path should match ChatInput and provide a fallback so valid files are not silently omitted.

Proposed fix
 				type: "base64",
 				value,
-				mime: file.type,
+				mime: file.type || "application/octet-stream",
 				name: file.name,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/lib/components/chat/ChatWindow.svelte` around lines 294 - 301,
The derived "sources" creation uses file.type directly which can be empty and
causes sendAgentMessage to drop file parts; update the mapping in the sources
computation (where file2base64 is used) to provide the same MIME fallback as
ChatInput (e.g. use file.type || <ChatInput's fallback> such as
'application/octet-stream' or file.mime) so blank MIME values are replaced
before creating the part; ensure this change is applied alongside file2base64
usage and will produce parts accepted by sendAgentMessage.
♻️ Duplicate comments (3)
bindu/extensions/did/did_agent_extension.py (1)

218-229: ⚠️ Potential issue | 🟠 Major

Rollback key files for all ACL-hardening failures.

Line 221 only catches RuntimeError; if subprocess.run() raises before _harden_windows_key_file_acl() converts the failure, the newly written key files are left behind.

Suggested fix
-            except RuntimeError:
+            except Exception:
                 for key_path in (self.private_key_path, self.public_key_path):
                     try:
                         key_path.unlink(missing_ok=True)
                     except OSError:
                         logger.warning(
                             f"Failed to remove key file during ACL hardening rollback: {key_path}"
                         )
                 raise
Python 3.12 subprocess.run FileNotFoundError OSError behavior when executable cannot be started
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/extensions/did/did_agent_extension.py` around lines 218 - 229, The
rollback currently only runs on RuntimeError, so if subprocess.run or other
calls raise OSError/FileNotFoundError (or any other exception) before
_harden_windows_key_file_acl converts it, the key files remain; change the
except to catch the broader exception class (e.g., except (RuntimeError,
OSError, FileNotFoundError) or simply except Exception) around the calls to
self._harden_windows_key_file_acl(self.private_key_path) and
self._harden_windows_key_file_acl(self.public_key_path) and keep the existing
loop that unlinks self.private_key_path and self.public_key_path (with
missing_ok=True and the same OSError warning) before re-raising the original
exception so all failures trigger the same rollback.
bindu/server/endpoints/voice_endpoints.py (1)

50-65: 🛠️ Refactor suggestion | 🟠 Major

Still move voice tuning knobs into app_settings.

Redis TTL, audio frame limits, queue depth, and echo-suppression windows remain hardcoded, so production tuning still requires code changes.

As per coding guidelines, "Use bindu.settings.app_settings for all configuration - never hardcode values such as URLs, ports, timeouts, API keys, or feature flags" and "NEVER create local config constants - use app_settings from bindu.settings". Based on learnings, "Use bindu.settings.app_settings for all configuration - never hardcode values such as URLs, ports, timeouts, API keys, or feature flags".

Also applies to: 418-420, 773-797, 813-813

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/server/endpoints/voice_endpoints.py` around lines 50 - 65, The
_RATE_LIMIT_LUA string (and other hardcoded constants like the Redis TTL value
120 inside it, audio frame limits, queue depth, and echo-suppression window
values referenced elsewhere) must be moved into bindu.settings.app_settings and
referenced from there instead of hardcoding; update the code that builds/uses
_RATE_LIMIT_LUA to read TTL and limit thresholds from app_settings (e.g.,
app_settings.voice.redis_ttl, app_settings.voice.rate_limit,
app_settings.voice.frame_limit, app_settings.voice.queue_depth,
app_settings.voice.echo_suppression_window) and replace other hardcoded
occurrences (the blocks around the other referenced sections) to use these
app_settings keys so production tuning can be done via configuration.
frontend/src/lib/services/voice-client.ts (1)

310-431: ⚠️ Potential issue | 🟠 Major

Clean up mic resources when audio setup throws.

After getUserMedia() succeeds, failures in AudioContext resume, worklet registration, node creation, or graph connection can leave the mic/context alive because there is no outer catch.

🔒 Proposed cleanup guard
-		const stream = await navigator.mediaDevices.getUserMedia({
+		let stream: MediaStream | null = null;
+		let audioContext: AudioContext | null = null;
+		try {
+			stream = await navigator.mediaDevices.getUserMedia({
 			audio: {
 				channelCount: 1,
 				sampleRate: 16000,
 				echoCancellation: true,
 				noiseSuppression: true,
 				autoGainControl: true,
 			},
-		});
+			});
@@
-		const audioContext = new AudioContextCtor({ sampleRate: desiredSampleRate });
+		audioContext = new AudioContextCtor({ sampleRate: desiredSampleRate });
 		await audioContext.resume();
@@
 		this.silentGainNode = silentGainNode;
 		this.isStreamingAudio = true;
+		} catch (err) {
+			stream?.getTracks().forEach((track) => track.stop());
+			if (audioContext) {
+				void audioContext.close();
+			}
+			throw err;
+		}
 	}
🟡 Minor comments (6)
bindu/penguin/config_validator.py-154-157 (1)

154-157: ⚠️ Potential issue | 🟡 Minor

Persist the normalized deployment.url.

Line 154 trims only the local variable, so configs like " http://localhost:3773 " pass validation but validate_and_process() still returns the untrimmed URL. Store the normalized value back into the nested config before building the processed result.

Proposed fix
         deployment_url = deployment_url.strip()
+        deployment_config["url"] = deployment_url
 
         parsed = urlparse(deployment_url)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/penguin/config_validator.py` around lines 154 - 157, The code trims the
local variable deployment_url but does not store the normalized value back into
the config, so validate_and_process() can return the original untrimmed string;
update the nested config key (e.g., config["deployment"]["url"] or equivalent
structure used in validate_and_process()) with deployment_url.strip() before
calling urlparse and before building the processed result so the normalized URL
is persisted; ensure you still use the trimmed deployment_url for
parsing/validation (parsed = urlparse(deployment_url)) and that the returned
processed config contains the trimmed value.
bindu/utils/worker/messages.py-93-93 (1)

93-93: ⚠️ Potential issue | 🟡 Minor

Remove dead code path for non-existent "data" key — it is not defined in FileWithBytes type.

The fallback file_info.get("bytes") or file_info.get("data", "") at line 93 accepts a "data" key that does not exist in the FileWithBytes type definition (per bindu/common/protocol/types.py). No producer or serializer writes this key; all tests, examples, and the protocol itself use only "bytes". Remove this dead branch, or if legacy compatibility is required, document it with a comment and update the error message to avoid suggesting "data" is a valid alternative.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/utils/worker/messages.py` at line 93, The assignment currently uses a
dead fallback for a non-existent "data" key; update the logic in
bindu/utils/worker/messages.py to read only the documented FileWithBytes field
(use file_info.get("bytes") and remove the file_info.get("data", "") branch),
and also adjust any related error/log messages in the same module that mention
"data" so they only reference "bytes" (or add a clear comment if you
intentionally keep legacy handling). Target the base64_data assignment and any
subsequent error handling that references "data" to ensure consistency with the
FileWithBytes type.
frontend/src/lib/server/database.ts-94-102 (1)

94-102: ⚠️ Potential issue | 🟡 Minor

_id preservation type-casts a possibly non-ObjectId value.

("_id" in source && source._id ? source._id : new ObjectId()) as ObjectId blindly casts whatever the caller passed (string, number, plain object) to ObjectId, and insertedId in the returned result inherits that lie. Downstream code calling ObjectId-specific methods (e.g. .toHexString(), .equals(...)) on the returned insertedId will fail at runtime when a caller seeds a non-ObjectId _id. Either normalize to an ObjectId instance, or widen the return type to ObjectId | string so callers handle both.

🛡️ Proposed normalization
-	async insertOne(
-		doc: Partial<T> | T
-	): Promise<{ insertedId: ObjectId; acknowledged: boolean }> {
-		const source = doc as Record<string, unknown>;
-		const id = ("_id" in source && source._id ? source._id : new ObjectId()) as ObjectId;
-		const docWithId = { ...source, _id: id } as T;
-		this.data.set(id.toString(), docWithId);
-		return { insertedId: id, acknowledged: true };
-	}
+	async insertOne(
+		doc: Partial<T> | T
+	): Promise<{ insertedId: ObjectId; acknowledged: boolean }> {
+		const source = doc as Record<string, unknown>;
+		const existing = "_id" in source ? source._id : undefined;
+		const id =
+			existing && typeof existing === "object" && "toHexString" in existing
+				? (existing as ObjectId)
+				: existing
+					? new ObjectId(String(existing))
+					: new ObjectId();
+		const docWithId = { ...source, _id: id } as T;
+		this.data.set(id.toString(), docWithId);
+		return { insertedId: id, acknowledged: true };
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/lib/server/database.ts` around lines 94 - 102, insertOne
currently casts any incoming _id to ObjectId which can lie and break callers;
change insertOne to normalize the id: if source._id is already an ObjectId use
it, else if it's a string/hex try to construct a new ObjectId from that string
(wrap in try/catch and fall back to new ObjectId()), otherwise generate a fresh
new ObjectId; ensure docWithId._id is that normalized ObjectId, store it in
this.data, and return that real ObjectId as insertedId (or if you prefer widen
the function signature return type to ObjectId | string, pick one consistent
approach and update insertOne's return type accordingly).
bindu/utils/notifications.py-251-259 (1)

251-259: ⚠️ Potential issue | 🟡 Minor

Preserve the wrapped HTTPError cause.

Add from exc to this raise statement so traceback context is explicit and Ruff B904 passes. This is consistent with exception chaining used elsewhere in the same function.

🧹 Proposed fix
-            raise NotificationDeliveryError(status, message or f"HTTP error {status}")
+            raise NotificationDeliveryError(
+                status, message or f"HTTP error {status}"
+            ) from exc
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/utils/notifications.py` around lines 251 - 259, The except block
catching error.HTTPError currently raises NotificationDeliveryError without
chaining the original exception; update the raise to preserve the cause by using
"from exc" so traceback context is explicit (i.e., when handling error.HTTPError
as exc, raise NotificationDeliveryError(status, message or f"HTTP error
{status}") from exc), keeping consistent with exception chaining used elsewhere
and satisfying Ruff B904; locate this in bindu/utils/notifications.py within the
HTTPError handling where exc, status, body, message are computed.
tests/unit/extensions/voice/test_voice_websocket_integration.py-112-120 (1)

112-120: ⚠️ Potential issue | 🟡 Minor

Patch provider/auth settings explicitly for deterministic tests.

These tests rely on _voice_preflight_error(), but only patch API keys. If the environment sets a different STT/TTS provider or enables session auth, the tests can take a different branch.

🧪 Proposed deterministic setup
     original_enabled = module.app_settings.voice.enabled
     original_stt = module.app_settings.voice.stt_api_key
     original_tts = module.app_settings.voice.tts_api_key
+    original_stt_provider = module.app_settings.voice.stt_provider
+    original_tts_provider = module.app_settings.voice.tts_provider
@@
         module.app_settings.voice.enabled = True
+        module.app_settings.voice.stt_provider = "deepgram"
+        module.app_settings.voice.tts_provider = "elevenlabs"
@@
         module.app_settings.voice.tts_api_key = original_tts
+        module.app_settings.voice.stt_provider = original_stt_provider
+        module.app_settings.voice.tts_provider = original_tts_provider

Apply the same pattern in the timeout test, and explicitly set/restore session_auth_required = False there.

Also applies to: 201-208

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/extensions/voice/test_voice_websocket_integration.py` around lines
112 - 120, The test relies on _voice_preflight_error() but only patches STT/TTS
API keys; explicitly patch provider/auth settings to make the test deterministic
by setting module.app_settings.voice.session_auth_required = False (or restoring
it to False in the timeout test) and explicitly set any provider-related fields
used by _voice_preflight_error() (e.g., stt_api_key, tts_api_key and provider
selection if present) before the assertion, and restore them after; apply the
same pattern for the other occurrence around lines referenced (the timeout test
and the block at 201-208) so both tests deterministically exercise the same
branch.
frontend/src/lib/stores/voice.ts-66-86 (1)

66-86: ⚠️ Potential issue | 🟡 Minor

Preserve turn boundaries for finalized transcripts.

This merges every consecutive event from the same role. If the user sends two finalized utterances in a row, or the agent emits a new finalized response after another agent event, the UI collapses distinct turns.

🧩 Proposed merge guard
-		if (last && last.role === event.role) {
+		if (last && last.role === event.role && !last.isFinal) {
 			const mergedText = mergeTranscriptText(last.text, event.text);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/lib/stores/voice.ts` around lines 66 - 86, In appendTranscript,
avoid merging a new event into the previous item when that previous turn is
finalized; currently any consecutive event with the same role gets merged.
Change the merge guard so merging only occurs if last exists, last.role ===
event.role, AND last.isFinal === false (i.e., only merge into non-finalized
turns). Use the same merge logic (mergeTranscriptText, building merged with
isFinal and ts) and update currentUserTranscript/currentAgentTranscript when
merging; otherwise push event as a new transcript entry.
🧹 Nitpick comments (9)
bindu/utils/worker/messages.py (2)

57-71: _decode_plain_text fallback order makes latin-1 effectively unreachable.

cp1252 is almost-total: it only fails on five undefined code points (0x81, 0x8D, 0x8F, 0x90, 0x9D). As a result, any byte sequence that isn't valid UTF-8 will almost always be accepted as cp1252, silently misinterpreting real latin-1 files across 0x80–0x9F (e.g., 0x80 becomes instead of a control char). If the intent is "Windows-first, then Unix", this is fine; if latin-1 is meant as a meaningful branch, swap them or drop it.

-        for encoding in ("utf-8", "cp1252", "latin-1"):
+        for encoding in ("utf-8", "latin-1", "cp1252"):

Alternatively, since latin-1 never raises UnicodeDecodeError, consider dropping the loop and using file_bytes.decode("utf-8", errors="replace") directly when UTF-8 fails — the current replacement fallback on line 71 is already unreachable in practice.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/utils/worker/messages.py` around lines 57 - 71, The _decode_plain_text
function currently tries encodings in ("utf-8","cp1252","latin-1") which makes
latin-1 effectively unreachable because cp1252 will decode nearly any bytes;
change the fallback strategy in _decode_plain_text: either swap the order to
("utf-8","latin-1","cp1252") if you want ISO-8859-1 to take precedence, or
simplify by attempting only UTF-8 and on failure returning
file_bytes.decode("utf-8", errors="replace") (i.e., remove the loop) so you
avoid silent cp1252 mis-decoding; update the function body of _decode_plain_text
accordingly and keep the existing logger calls consistent with the chosen
behavior.

115-134: Dead branch at lines 117–118 — base64_data is already guaranteed non-empty here.

Line 95 rejects the part (continue) whenever not base64_data, so by the time we reach the try: block base64_data is truthy and raise ValueError("Missing file bytes") is unreachable. Minor cleanup:

             try:
-                # Decode the Base64 payload
-                if not base64_data:
-                    raise ValueError("Missing file bytes")
-
-                padding = (
+                # Estimate decoded size before allocating to reject oversized payloads early
+                padding = (
                     2
                     if base64_data.endswith("==")
                     else 1
                     if base64_data.endswith("=")
                     else 0
                 )

The post-decode len(file_bytes) > MAX_FILE_SIZE check at lines 132–133 is good defense-in-depth against a pathological short-padded payload and should stay.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/utils/worker/messages.py` around lines 115 - 134, Remove the redundant
precondition check inside the try block that raises ValueError("Missing file
bytes") for base64_data (the variable base64_data is already checked earlier
with continue), i.e. delete the if not base64_data: raise ValueError(...) branch
and leave the rest of the logic (estimated_size calc, estimated_size >
MAX_FILE_SIZE check, base64.b64decode and the post-decode len(file_bytes) >
MAX_FILE_SIZE check) intact so defense-in-depth remains.
frontend/src/lib/server/database.ts (1)

458-540: Download-stream refactor correctly de-dupes emissions; one residual ordering caveat.

The scheduled.{error,data,end} flags ensure each event is scheduled at most once per stream lifetime, resolving the prior duplicate-delivery bug where each on()/once() registration queued its own emit. once wraps the callback and self-removes via off, and destroy guards emit via the destroyed flag. Nice cleanup.

Minor caveat: because each event is scheduled only on its first registration via independent setTimeout(..., 0) timers, the emit order is determined by listener-registration order, not by stream semantics. A consumer that registers on("end", ...) before on("data", ...) will observe end before data. Typical consumers register data first so this is unlikely to bite, but if you want to be defensive you can schedule both data and end together the first time either is registered (when file exists), preserving dataend order regardless of registration order.

♻️ Optional hardening for emit ordering
-			if (event === "data" && !scheduled.data && file) {
-				scheduled.data = true;
-				setTimeout(() => emit("data", file.data), 0);
-			}
-			if (event === "end" && !scheduled.end && file) {
-				scheduled.end = true;
-				setTimeout(() => emit("end"), 0);
-			}
+			if ((event === "data" || event === "end") && file && !scheduled.data) {
+				scheduled.data = true;
+				scheduled.end = true;
+				setTimeout(() => {
+					emit("data", file.data);
+					emit("end");
+				}, 0);
+			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/lib/server/database.ts` around lines 458 - 540, The scheduled
flags prevent duplicate emits but can deliver "end" before "data" if listeners
register in reverse order; update scheduleInitialEmit inside openDownloadStream
(and ensure downloadStream.on/once call it) so that when either "data" or "end"
is first registered and file exists you schedule both emits together in the
correct order: schedule "data" (set scheduled.data = true and setTimeout to emit
data) and then schedule "end" (set scheduled.end = true and setTimeout to emit
end), rather than scheduling only the single requested event, so consumers
always see data → end regardless of registration order.
frontend/src/lib/services/agent-api.ts (1)

218-220: Redundant nullish fallback.

AgentAPI.getBaseUrl() always returns a non-empty string because the constructor (line 26) guarantees this.baseUrl is initialized to either the passed value or 'http://localhost:3773'. The ?? 'http://localhost:3773' here is dead code and misleads readers into thinking the method can return null/undefined.

♻️ Proposed simplification
-export function getAgentBaseUrl(): string {
-  return agentAPI.getBaseUrl() ?? 'http://localhost:3773';
-}
+export function getAgentBaseUrl(): string {
+  return agentAPI.getBaseUrl();
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/lib/services/agent-api.ts` around lines 218 - 220,
getAgentBaseUrl contains a redundant nullish fallback because
AgentAPI.getBaseUrl() is guaranteed to return a non-empty string; remove the "??
'http://localhost:3773'" and simply return agentAPI.getBaseUrl() directly to
avoid dead code and clarify intent (refer to getAgentBaseUrl and
AgentAPI.getBaseUrl/AgentAPI constructor to locate the change).
bindu/extensions/voice/service_factory.py (1)

80-104: Redundant runtime validation of pydantic-validated setting.

app_settings.voice.tts_fallback_provider is typed as Literal["none", "elevenlabs", "azure"] (settings.py line 1067), so pydantic already guarantees it's a str in that set at load time. The isinstance(..., str) check and the membership re-validation at lines 81-85 are unreachable in practice; they'll also silently mask any future Literal expansion in settings. Consider trimming to just the direct use, and optionally narrow the fallback except Exception at line 100 to (ImportError, ValueError, RuntimeError) for consistency with the primary path.

♻️ Proposed simplification
-    provider = config.tts_provider
-    fallback_provider_raw = app_settings.voice.tts_fallback_provider
-    fallback_provider = (
-        fallback_provider_raw if isinstance(fallback_provider_raw, str) else "none"
-    )
-    if fallback_provider not in {"none", "elevenlabs", "azure"}:
-        fallback_provider = "none"
+    provider = config.tts_provider
+    fallback_provider = app_settings.voice.tts_fallback_provider

     try:
         return _create_tts_service_for_provider(provider, config)
     except (ImportError, ValueError) as primary_error:
-        if fallback_provider not in {"", "none", provider}:
+        if fallback_provider not in {"none", provider}:
             logger.warning(
                 "Primary TTS provider failed; attempting fallback",
                 ...
             )
             try:
                 return _create_tts_service_for_provider(fallback_provider, config)
-            except Exception as fallback_error:
+            except (ImportError, ValueError, RuntimeError) as fallback_error:
                 raise RuntimeError(
                     "TTS setup failed for primary and fallback providers"
                 ) from fallback_error
         raise
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/extensions/voice/service_factory.py` around lines 80 - 104, The current
runtime re-validation of app_settings.voice.tts_fallback_provider (variables
fallback_provider_raw and fallback_provider) is redundant because Pydantic
already enforces Literal["none","elevenlabs","azure"]; remove the
isinstance(...) and membership checks and use
app_settings.voice.tts_fallback_provider directly as fallback_provider; keep the
existing try of _create_tts_service_for_provider(provider, config) and, in the
fallback branch that calls _create_tts_service_for_provider(fallback_provider,
config), narrow the broad except Exception that catches fallback_error to a
specific tuple like (ImportError, ValueError, RuntimeError) to match the primary
path and avoid masking other errors; update logger usage to reference provider,
fallback_provider, primary_error as before and preserve raising the
fallback_error as the cause.
bindu/settings.py (2)

1127-1134: Consider logging the session_token_ttl clamp.

Silently clamping session_token_ttl to session_timeout lets the server start with a config that differs from what the operator set. For auth-sensitive values, a warning log so ops can see the mismatch in startup logs is preferable; otherwise a misconfigured env var is invisible until someone reads the source.

Note that using logger inside a pydantic model_validator can be tricky during settings bootstrap (circular import with bindu.utils.logging if it imports from bindu.settings). If that's a concern, emit the warning post-construction at the call site in bindu/penguin/bindufy.py where voice is wired up instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/settings.py` around lines 1127 - 1134, The validator
_validate_session_token_ttl in VoiceSettings currently silently clamps
session_token_ttl to session_timeout; update it to emit a warning including the
original (requested) TTL and the clamped value so operators see the mismatch,
e.g., capture the old value before clamping and call a logger.warning with both
values and context; if importing the project logger in bindu/settings.py risks a
circular import, instead leave the validator as-is but add a post-construction
check in the voice wiring code in bindu/penguin/bindufy.py (where the
VoiceSettings instance is created/used) that compares session_token_ttl and
session_timeout and logs the same warning when clamping is needed.

9-9: Type-hint style inconsistency.

The rest of this file uses PEP 604 str | None style (e.g., lines 648, 658, 693, 697-700), but the new redis_url: Optional[str] at line 1109 imports and uses Optional. For consistency, prefer str | None and drop Optional from the import.

♻️ Proposed change
-from typing import Literal, Optional
+from typing import Literal
-    redis_url: Optional[str] = None  # e.g., "redis://localhost:6379/0"
+    redis_url: str | None = None  # e.g., "redis://localhost:6379/0"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/settings.py` at line 9, The import and type-hint for redis_url use
Optional which is inconsistent with the file's PEP 604 style; update the import
line to remove Optional (keep Literal) and change the annotation for redis_url
from Optional[str] to str | None so the code uses the same `str | None` style
used elsewhere (locate the import `from typing import Literal, Optional` and the
`redis_url: Optional[str]` declaration to modify).
bindu/extensions/voice/redis_session_manager.py (2)

350-371: zip across two set iterations — add strict=True and materialize.

members is the return of smembers (a Python set when decode_responses=True). You iterate it twice: once to queue pipeline.get(key) (line 351-352) and once in zip(members, values) (line 357). CPython guarantees deterministic iteration order for an unchanged set within a single process, so this happens to work today, but it's brittle — any refactor that interleaves a mutation, or a future change to a non-set return type, can silently misalign keys with values.

Materialize to a list and use strict=True (also resolves Ruff B905):

♻️ Proposed fix
-            members = await self._redis_client.smembers(REDIS_ACTIVE_SET_KEY)
-            if not members:
+            members = list(await self._redis_client.smembers(REDIS_ACTIVE_SET_KEY))
+            if not members:
                 return 0

             pipeline = self._redis_client.pipeline()
             for key in members:
                 pipeline.get(key)
             values = await pipeline.execute()

             count = 0
             stale_members: list[str] = []
-            for key, data in zip(members, values):
+            for key, data in zip(members, values, strict=True):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/extensions/voice/redis_session_manager.py` around lines 350 - 371, The
bug is that `members` (returned from `smembers`) is a set and is iterated twice
which can misalign keys and pipeline `values`; fix by materializing `members` to
a list before creating the pipeline and by using zip(members_list, values,
strict=True) when pairing keys to results (update references to `members` in the
pipeline loop and the for loop to use the new `members_list`); this also
satisfies Ruff B905 and prevents brittle iteration-order issues in the function
(e.g., the method that contains `pipeline = self._redis_client.pipeline()` and
the `zip(members, values)` usage).

405-446: Use a pipeline for bulk reads in the cleanup loop.

_expire_timed_out_sessions issues one GET per active-set member serially (line 415). On deployments with many concurrent voice sessions this becomes N round-trips on every 30-second tick, plus another N round-trips for the delete loop. Mirroring the pattern used in get_active_count (pipelined GETs) will keep cleanup scalable, and the deletions can also be pipelined.

Also, the cleanup loop runs every 30s regardless of session_timeout — consider making the interval derive from session_timeout (e.g., min(30, session_timeout // 2)) so short-lived test configs (e.g., session_timeout=10) still get timely expiration.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/extensions/voice/redis_session_manager.py` around lines 405 - 446,
_expire_timed_out_sessions currently does N serial GETs and N serial deletes;
change it to use a Redis pipeline like in get_active_count: after fetching
members from REDIS_ACTIVE_SET_KEY, issue a pipelined GET for all keys via
self._redis_client.pipeline() / multi_exec and await a single round-trip, map
responses back to keys, then collect expired keys; for removals, use the same
pipeline to either EVALSHA (when _delete_session_script_sha is set) or DELETE +
SREM for each expired key in a single pipeline call to avoid N round-trips. Also
make the cleanup tick caller derive its interval from self._session_timeout
(e.g., interval = min(30, max(1, self._session_timeout // 2))) instead of a
fixed 30s so short session_timeout values expire promptly—update wherever
_expire_timed_out_sessions is scheduled to use that computed interval. Use
identifiers _expire_timed_out_sessions, REDIS_ACTIVE_SET_KEY,
_delete_session_script_sha, _redis_client, and get_active_count to locate the
relevant code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bindu/extensions/did/did_agent_extension.py`:
- Around line 254-272: Stop using COMPUTERNAME and the "unknown_user" fallback
when resolving username in the ACL hardening code: remove
os.getenv("COMPUTERNAME") from the fallback chain and do not assign
"unknown_user"; instead if username cannot be determined, raise a RuntimeError
to fail closed. Also broaden the exception catch around os.getlogin() to
Exception (so non-OSError exceptions from Python 3.12 don’t abort the chain) and
wrap getpass.getuser() invocation in a try/except (catching Exception) so it
doesn’t raise out and prevents further fallbacks; keep the final RuntimeError
message from the existing RuntimeError(...) and keep the logger.warning only if
you retain any non-secure fallback (preferably remove that warning since we will
now fail).
- Around line 232-244: When writing the private key in the block that opens
self.private_key_path (inside the recreate_keys flow), call os.fchmod(fd, 0o600)
immediately after obtaining the descriptor (fd) and before writing private_pem
to the file so permissions are tightened on the open descriptor and no sensitive
bytes are exposed; keep the existing os.chmod call if you want to ensure
filesystem metadata is correct but the critical change is os.fchmod(fd, 0o600)
before the with os.fdopen(fd, "wb") write, and apply the analogous pattern for
the public key descriptor (use 0o644) if desired.

In `@bindu/extensions/voice/agent_bridge.py`:
- Around line 33-38: Replace the hardcoded voice defaults (MAX_HISTORY_TURNS,
DEFAULT_FIRST_TOKEN_TIMEOUT_SECONDS, DEFAULT_TOTAL_RESPONSE_TIMEOUT_SECONDS,
DEFAULT_CANCELLATION_GRACE_SECONDS, DEFAULT_THINKING_TEXT,
DEFAULT_TIMEOUT_FALLBACK_TEXT) with values sourced from app_settings.voice:
import app_settings from bindu.settings and use
app_settings.voice.<setting_name> (or getattr(app_settings.voice,
"<setting_name>", <current_constant>)) so the processor falls back to the
centralized configuration when voice_settings is omitted; also update any places
that currently reference the local constants (including the timeouts/strings
used around the voice_settings handling) to read from app_settings.voice instead
of the local constants.
- Around line 197-199: The debug statements in agent_bridge.py currently log raw
transcript text (logger.debug(f"Voice user ({self._context_id}):
{text[:80]}...")) which may contain PII; replace those debug calls (including
the similar call around lines 231-233) to avoid printing the utterance itself
and instead log safe metadata such as self._context_id and the character count
(e.g., "Voice user (context_id): length=NN chars") or a hashed/obfuscated token
if you need to correlate, leaving out the raw text; update both logger.debug
occurrences referencing self._context_id and text to follow this privacy-safe
pattern.

In `@bindu/extensions/voice/redis_session_manager.py`:
- Around line 120-148: The current __aenter__ implementation loads and caches
Lua script SHAs into _create_session_script_sha, _update_session_script_sha and
_delete_session_script_sha using
script_load(_CREATE_SESSION_LUA/_UPDATE_SESSION_LUA/_DELETE_SESSION_LUA);
replace that with redis-py’s client.register_script(...) to obtain AsyncScript
objects (e.g. create_session_script =
self._redis_client.register_script(_CREATE_SESSION_LUA), etc.), remove the
manual SHA caching and script_load calls in __aenter__, and update all places
that call evalsha(...) (the session create/update/delete/expire sites) to use
the AsyncScript.call(...) method so the library will auto-reload scripts on
NoScriptError and handle cache eviction transparently.

In `@bindu/server/endpoints/voice_endpoints.py`:
- Around line 798-800: The current log in the on_agent_transcript handler prints
transcript text (text[:50]) which can contain sensitive data; change the
logger.info call (the one referencing text and is_final) to avoid any transcript
content and instead log privacy-safe metadata such as text length (len(text)),
is_final, and any other context IDs (session_id, agent_id) if available—do not
include the transcript or portions of it in the message. Ensure you update the
logger.info invocation in voice_endpoints.py that mentions text/is_final to
reflect only these metadata fields.
- Around line 977-989: The teardown silently ignores failures—update_state,
bridge cleanup, and end_session can all raise exceptions but are caught without
logging; change each except block to log the error (including session_id and
exception details) instead of passing so failures are visible for debugging.
Specifically around session_manager.update_state(session_id, "ending"),
components["bridge"].cleanup_background_tasks(), and
session_manager.end_session(session_id) replace the bare excepts with logging
calls (e.g., logger.exception or logger.error with exc_info) that include a
descriptive message and the session_id to aid diagnostics.
- Around line 912-928: The current flow awaits only runner_task so reader_task
(from _voice_control_reader) can fail or return without being observed; change
coordination to await both runner_task and reader_task (e.g., use asyncio.gather
or asyncio.wait) and ensure any exception from reader_task is retrieved and
handled—if reader_task errors or returns early, cancel runner_task (runner.run)
and propagate/log the reader exception, and if runner_task errors cancel
reader_task; make these changes around the creation of runner_task and
reader_task so both tasks are awaited and cleaned up before leaving the async
with timeout block.

In `@bindu/utils/notifications.py`:
- Around line 132-139: Update _resolve_and_validate_destination to validate the
URL scheme and tighten port logic: ensure parsed.scheme is present and one of
"http" or "https" (raise ValueError otherwise), and only default port to 80/443
when scheme is http/https; for any other scheme require an explicit port or
reject the URL. This defends calls from _post_once that bypass validate_config
and keeps hostname, scheme, and port handling consistent across validate_config
and _resolve_and_validate_destination.
- Around line 227-244: The URL rewrite currently replaces the hostname with
destination_ip and sets the Host header (variables target_url, parsed,
destination_ip, headers, req, request.urlopen), which breaks TLS hostname
verification and mishandles IPv6 and userinfo; instead, for HTTPS keep the
original hostname in the request URL (so certificate verification and SNI use
it) and use a custom transport/connection that connects to the resolved IP (so
the socket connects to destination_ip while preserving server_hostname for SNI),
for HTTP only construct the netloc and Host header from parsed.hostname and
parsed.port (wrap IPv6 in brackets like [::1] and never include
parsed.username/parsed.password in Host), and remove setting Host to
parsed.netloc; ensure req.add_header uses the sanitized hostname-only Host value
and use the custom transport when calling urlopen with self.timeout.

In `@bindu/utils/worker/messages.py`:
- Line 21: Replace the hardcoded MAX_FILE_SIZE module constant by adding a new
settings entry (e.g., worker.max_file_size) in the central app_settings schema
with a default of 10 * 1024 * 1024 (10 MiB), then import
bindu.settings.app_settings into bindu/utils/worker/messages.py and replace all
uses of MAX_FILE_SIZE with app_settings.worker.max_file_size (including the two
current call sites where MAX_FILE_SIZE is referenced in this module). Remove the
MAX_FILE_SIZE constant declaration and ensure any type conversions/units match
the original behavior so no runtime changes occur.

In `@frontend/src/lib/components/chat/ChatInput.svelte`:
- Around line 36-39: ChatInput currently clears its bound draft (value and
files) immediately after onsubmit resolves, which can lose user input if the
parent aborts or send fails; change the contract so onsubmit returns a boolean
success (Promise<boolean> | boolean) and update the submit handler (the internal
submit function that calls the onsubmit prop) to only clear the bound
value/files when onsubmit resolves true; alternatively, remove automatic
clearing entirely and let the parent clear the bound props—adjust the onsubmit
type signature and the submit handler logic in ChatInput to reflect the chosen
approach (reference the onsubmit prop and the internal handler that currently
clears value/files, including the similar logic at the region around the lines
referenced 184-190).

In `@frontend/src/lib/components/chat/ChatWindow.svelte`:
- Around line 173-176: The loop is discarding updates from sendAgentMessage() so
message/task/final-answer updates never get applied; change the for-await loop
to process each yielded update (don't use the unused _update variable) and apply
them to the component state by invoking the existing onmessage handler if
present (onmessage(update)) or otherwise updating the messages store/array
(e.g., append user/assistant messages, merge task metadata into the related
message, and set final-answer content when received). Ensure you still fully
consume the async iterator but handle each update case so the user message, task
metadata, and assistant response are rendered (referencing sendAgentMessage,
onmessage, messages, and contextId/fileParts to locate the code).

In `@frontend/src/lib/components/voice/VoiceCallPanel.svelte`:
- Around line 68-82: The current logic resets playbackNextTime back to
ctx.currentTime after scheduling (variables: audioBuffer, source, startAt,
playbackNextTime, ctx), which can move the timeline backwards and cause the next
chunk to start while previous audio is still queued and leak playback into STT;
remove or stop moving playbackNextTime backwards — do not set playbackNextTime =
ctx.currentTime after source.start — and instead compute the queued audio
duration (queuedDuration = playbackNextTime - ctx.currentTime) and hold/mute
voice input for that queuedDuration (not only for speech-synthesis fallback) so
the microphone is blocked for the full time audio remains scheduled.

In `@frontend/src/lib/stores/chat.ts`:
- Around line 21-32: The FilePart.file.bytes field must be a base64 string to
avoid JSON serialization loss: update the FilePart type to narrow file.bytes to
string only and modify sendMessage (the function that calls
agentAPI.sendMessage) to detect non-string binary inputs
(Uint8Array/ArrayBuffer/Blob) and convert them to base64 before forwarding;
reference FilePart / FilePart.file.bytes, sendMessage, and agentAPI.sendMessage
(agent-api.ts uses JSON.stringify) so reviewers can find where to change the
type and add the binary->base64 conversion/validation.

In `@frontend/src/lib/utils/agentMessageHandler.ts`:
- Around line 66-73: The FilePart.file.bytes type and any places that accept
fileParts[].value must be narrowed to string (base64) and ensure binary inputs
are converted before JSON.stringify: update the FilePart interface
(FilePart.file.bytes) to accept only string (base64), and in the code paths that
build/send messages (the logic that iterates fileParts / handles value
conversion, referenced around the handlers at lines where fileParts[].value is
used) detect ArrayBuffer/Uint8Array/Blob and convert them to base64 strings
before attaching to the payload; keep the name/shape of FilePart and preserve
mimeType/name but encode bytes to base64 so JSON.stringify sends correct file
data.

---

Outside diff comments:
In `@frontend/src/lib/components/chat/ChatWindow.svelte`:
- Around line 294-301: The derived "sources" creation uses file.type directly
which can be empty and causes sendAgentMessage to drop file parts; update the
mapping in the sources computation (where file2base64 is used) to provide the
same MIME fallback as ChatInput (e.g. use file.type || <ChatInput's fallback>
such as 'application/octet-stream' or file.mime) so blank MIME values are
replaced before creating the part; ensure this change is applied alongside
file2base64 usage and will produce parts accepted by sendAgentMessage.

---

Minor comments:
In `@bindu/penguin/config_validator.py`:
- Around line 154-157: The code trims the local variable deployment_url but does
not store the normalized value back into the config, so validate_and_process()
can return the original untrimmed string; update the nested config key (e.g.,
config["deployment"]["url"] or equivalent structure used in
validate_and_process()) with deployment_url.strip() before calling urlparse and
before building the processed result so the normalized URL is persisted; ensure
you still use the trimmed deployment_url for parsing/validation (parsed =
urlparse(deployment_url)) and that the returned processed config contains the
trimmed value.

In `@bindu/utils/notifications.py`:
- Around line 251-259: The except block catching error.HTTPError currently
raises NotificationDeliveryError without chaining the original exception; update
the raise to preserve the cause by using "from exc" so traceback context is
explicit (i.e., when handling error.HTTPError as exc, raise
NotificationDeliveryError(status, message or f"HTTP error {status}") from exc),
keeping consistent with exception chaining used elsewhere and satisfying Ruff
B904; locate this in bindu/utils/notifications.py within the HTTPError handling
where exc, status, body, message are computed.

In `@bindu/utils/worker/messages.py`:
- Line 93: The assignment currently uses a dead fallback for a non-existent
"data" key; update the logic in bindu/utils/worker/messages.py to read only the
documented FileWithBytes field (use file_info.get("bytes") and remove the
file_info.get("data", "") branch), and also adjust any related error/log
messages in the same module that mention "data" so they only reference "bytes"
(or add a clear comment if you intentionally keep legacy handling). Target the
base64_data assignment and any subsequent error handling that references "data"
to ensure consistency with the FileWithBytes type.

In `@frontend/src/lib/server/database.ts`:
- Around line 94-102: insertOne currently casts any incoming _id to ObjectId
which can lie and break callers; change insertOne to normalize the id: if
source._id is already an ObjectId use it, else if it's a string/hex try to
construct a new ObjectId from that string (wrap in try/catch and fall back to
new ObjectId()), otherwise generate a fresh new ObjectId; ensure docWithId._id
is that normalized ObjectId, store it in this.data, and return that real
ObjectId as insertedId (or if you prefer widen the function signature return
type to ObjectId | string, pick one consistent approach and update insertOne's
return type accordingly).

In `@frontend/src/lib/stores/voice.ts`:
- Around line 66-86: In appendTranscript, avoid merging a new event into the
previous item when that previous turn is finalized; currently any consecutive
event with the same role gets merged. Change the merge guard so merging only
occurs if last exists, last.role === event.role, AND last.isFinal === false
(i.e., only merge into non-finalized turns). Use the same merge logic
(mergeTranscriptText, building merged with isFinal and ts) and update
currentUserTranscript/currentAgentTranscript when merging; otherwise push event
as a new transcript entry.

In `@tests/unit/extensions/voice/test_voice_websocket_integration.py`:
- Around line 112-120: The test relies on _voice_preflight_error() but only
patches STT/TTS API keys; explicitly patch provider/auth settings to make the
test deterministic by setting module.app_settings.voice.session_auth_required =
False (or restoring it to False in the timeout test) and explicitly set any
provider-related fields used by _voice_preflight_error() (e.g., stt_api_key,
tts_api_key and provider selection if present) before the assertion, and restore
them after; apply the same pattern for the other occurrence around lines
referenced (the timeout test and the block at 201-208) so both tests
deterministically exercise the same branch.

---

Duplicate comments:
In `@bindu/extensions/did/did_agent_extension.py`:
- Around line 218-229: The rollback currently only runs on RuntimeError, so if
subprocess.run or other calls raise OSError/FileNotFoundError (or any other
exception) before _harden_windows_key_file_acl converts it, the key files
remain; change the except to catch the broader exception class (e.g., except
(RuntimeError, OSError, FileNotFoundError) or simply except Exception) around
the calls to self._harden_windows_key_file_acl(self.private_key_path) and
self._harden_windows_key_file_acl(self.public_key_path) and keep the existing
loop that unlinks self.private_key_path and self.public_key_path (with
missing_ok=True and the same OSError warning) before re-raising the original
exception so all failures trigger the same rollback.

In `@bindu/server/endpoints/voice_endpoints.py`:
- Around line 50-65: The _RATE_LIMIT_LUA string (and other hardcoded constants
like the Redis TTL value 120 inside it, audio frame limits, queue depth, and
echo-suppression window values referenced elsewhere) must be moved into
bindu.settings.app_settings and referenced from there instead of hardcoding;
update the code that builds/uses _RATE_LIMIT_LUA to read TTL and limit
thresholds from app_settings (e.g., app_settings.voice.redis_ttl,
app_settings.voice.rate_limit, app_settings.voice.frame_limit,
app_settings.voice.queue_depth, app_settings.voice.echo_suppression_window) and
replace other hardcoded occurrences (the blocks around the other referenced
sections) to use these app_settings keys so production tuning can be done via
configuration.

---

Nitpick comments:
In `@bindu/extensions/voice/redis_session_manager.py`:
- Around line 350-371: The bug is that `members` (returned from `smembers`) is a
set and is iterated twice which can misalign keys and pipeline `values`; fix by
materializing `members` to a list before creating the pipeline and by using
zip(members_list, values, strict=True) when pairing keys to results (update
references to `members` in the pipeline loop and the for loop to use the new
`members_list`); this also satisfies Ruff B905 and prevents brittle
iteration-order issues in the function (e.g., the method that contains `pipeline
= self._redis_client.pipeline()` and the `zip(members, values)` usage).
- Around line 405-446: _expire_timed_out_sessions currently does N serial GETs
and N serial deletes; change it to use a Redis pipeline like in
get_active_count: after fetching members from REDIS_ACTIVE_SET_KEY, issue a
pipelined GET for all keys via self._redis_client.pipeline() / multi_exec and
await a single round-trip, map responses back to keys, then collect expired
keys; for removals, use the same pipeline to either EVALSHA (when
_delete_session_script_sha is set) or DELETE + SREM for each expired key in a
single pipeline call to avoid N round-trips. Also make the cleanup tick caller
derive its interval from self._session_timeout (e.g., interval = min(30, max(1,
self._session_timeout // 2))) instead of a fixed 30s so short session_timeout
values expire promptly—update wherever _expire_timed_out_sessions is scheduled
to use that computed interval. Use identifiers _expire_timed_out_sessions,
REDIS_ACTIVE_SET_KEY, _delete_session_script_sha, _redis_client, and
get_active_count to locate the relevant code.

In `@bindu/extensions/voice/service_factory.py`:
- Around line 80-104: The current runtime re-validation of
app_settings.voice.tts_fallback_provider (variables fallback_provider_raw and
fallback_provider) is redundant because Pydantic already enforces
Literal["none","elevenlabs","azure"]; remove the isinstance(...) and membership
checks and use app_settings.voice.tts_fallback_provider directly as
fallback_provider; keep the existing try of
_create_tts_service_for_provider(provider, config) and, in the fallback branch
that calls _create_tts_service_for_provider(fallback_provider, config), narrow
the broad except Exception that catches fallback_error to a specific tuple like
(ImportError, ValueError, RuntimeError) to match the primary path and avoid
masking other errors; update logger usage to reference provider,
fallback_provider, primary_error as before and preserve raising the
fallback_error as the cause.

In `@bindu/settings.py`:
- Around line 1127-1134: The validator _validate_session_token_ttl in
VoiceSettings currently silently clamps session_token_ttl to session_timeout;
update it to emit a warning including the original (requested) TTL and the
clamped value so operators see the mismatch, e.g., capture the old value before
clamping and call a logger.warning with both values and context; if importing
the project logger in bindu/settings.py risks a circular import, instead leave
the validator as-is but add a post-construction check in the voice wiring code
in bindu/penguin/bindufy.py (where the VoiceSettings instance is created/used)
that compares session_token_ttl and session_timeout and logs the same warning
when clamping is needed.
- Line 9: The import and type-hint for redis_url use Optional which is
inconsistent with the file's PEP 604 style; update the import line to remove
Optional (keep Literal) and change the annotation for redis_url from
Optional[str] to str | None so the code uses the same `str | None` style used
elsewhere (locate the import `from typing import Literal, Optional` and the
`redis_url: Optional[str]` declaration to modify).

In `@bindu/utils/worker/messages.py`:
- Around line 57-71: The _decode_plain_text function currently tries encodings
in ("utf-8","cp1252","latin-1") which makes latin-1 effectively unreachable
because cp1252 will decode nearly any bytes; change the fallback strategy in
_decode_plain_text: either swap the order to ("utf-8","latin-1","cp1252") if you
want ISO-8859-1 to take precedence, or simplify by attempting only UTF-8 and on
failure returning file_bytes.decode("utf-8", errors="replace") (i.e., remove the
loop) so you avoid silent cp1252 mis-decoding; update the function body of
_decode_plain_text accordingly and keep the existing logger calls consistent
with the chosen behavior.
- Around line 115-134: Remove the redundant precondition check inside the try
block that raises ValueError("Missing file bytes") for base64_data (the variable
base64_data is already checked earlier with continue), i.e. delete the if not
base64_data: raise ValueError(...) branch and leave the rest of the logic
(estimated_size calc, estimated_size > MAX_FILE_SIZE check, base64.b64decode and
the post-decode len(file_bytes) > MAX_FILE_SIZE check) intact so
defense-in-depth remains.

In `@frontend/src/lib/server/database.ts`:
- Around line 458-540: The scheduled flags prevent duplicate emits but can
deliver "end" before "data" if listeners register in reverse order; update
scheduleInitialEmit inside openDownloadStream (and ensure downloadStream.on/once
call it) so that when either "data" or "end" is first registered and file exists
you schedule both emits together in the correct order: schedule "data" (set
scheduled.data = true and setTimeout to emit data) and then schedule "end" (set
scheduled.end = true and setTimeout to emit end), rather than scheduling only
the single requested event, so consumers always see data → end regardless of
registration order.

In `@frontend/src/lib/services/agent-api.ts`:
- Around line 218-220: getAgentBaseUrl contains a redundant nullish fallback
because AgentAPI.getBaseUrl() is guaranteed to return a non-empty string; remove
the "?? 'http://localhost:3773'" and simply return agentAPI.getBaseUrl()
directly to avoid dead code and clarify intent (refer to getAgentBaseUrl and
AgentAPI.getBaseUrl/AgentAPI constructor to locate the change).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f71b8a22-7e31-4d5e-be60-7fb777d46e4a

📥 Commits

Reviewing files that changed from the base of the PR and between bb0e6fe and 9c47a4d.

📒 Files selected for processing (40)
  • .gitignore
  • bindu/extensions/__init__.py
  • bindu/extensions/did/did_agent_extension.py
  • bindu/extensions/voice/agent_bridge.py
  • bindu/extensions/voice/pipeline_builder.py
  • bindu/extensions/voice/redis_session_manager.py
  • bindu/extensions/voice/service_factory.py
  • bindu/extensions/voice/session_factory.py
  • bindu/grpc/client.py
  • bindu/grpc/server.py
  • bindu/grpc/service.py
  • bindu/penguin/config_validator.py
  • bindu/server/applications.py
  • bindu/server/endpoints/voice_endpoints.py
  • bindu/settings.py
  • bindu/utils/notifications.py
  • bindu/utils/worker/messages.py
  • frontend/.env.example
  • frontend/src/lib/components/chat/ChatInput.svelte
  • frontend/src/lib/components/chat/ChatWindow.svelte
  • frontend/src/lib/components/voice/LiveTranscript.svelte
  • frontend/src/lib/components/voice/VoiceCallPanel.svelte
  • frontend/src/lib/migrations/lock.ts
  • frontend/src/lib/migrations/routines/10-update-reports-assistantid.ts
  • frontend/src/lib/server/config.ts
  • frontend/src/lib/server/database.ts
  • frontend/src/lib/server/files/uploadFile.ts
  • frontend/src/lib/services/agent-api.ts
  • frontend/src/lib/services/voice-client.ts
  • frontend/src/lib/stores/chat.ts
  • frontend/src/lib/stores/voice.ts
  • frontend/src/lib/utils/agentMessageHandler.ts
  • frontend/src/lib/utils/paymentHandler.ts
  • pytest.ini
  • tests/integration/grpc/test_grpc_e2e.py
  • tests/unit/extensions/voice/test_session_manager.py
  • tests/unit/extensions/voice/test_voice_websocket_integration.py
  • tests/unit/penguin/test_bindufy.py
  • tests/unit/server/scheduler/test_memory_scheduler.py
  • tests/unit/test_minimax_example.py
✅ Files skipped from review due to trivial changes (4)
  • frontend/src/lib/migrations/lock.ts
  • bindu/extensions/init.py
  • frontend/src/lib/components/voice/LiveTranscript.svelte
  • tests/unit/server/scheduler/test_memory_scheduler.py
🚧 Files skipped from review as they are similar to previous changes (7)
  • pytest.ini
  • frontend/src/lib/migrations/routines/10-update-reports-assistantid.ts
  • frontend/src/lib/server/config.ts
  • .gitignore
  • bindu/extensions/voice/pipeline_builder.py
  • bindu/server/applications.py
  • tests/unit/extensions/voice/test_session_manager.py

Comment thread bindu/extensions/did/did_agent_extension.py Outdated
Comment thread bindu/extensions/did/did_agent_extension.py
Comment thread bindu/extensions/voice/agent_bridge.py Outdated
Comment thread bindu/extensions/voice/agent_bridge.py
Comment thread bindu/extensions/voice/redis_session_manager.py
Comment thread frontend/src/lib/components/chat/ChatInput.svelte Outdated
Comment thread frontend/src/lib/components/chat/ChatWindow.svelte
Comment thread frontend/src/lib/components/voice/VoiceCallPanel.svelte Outdated
Comment thread frontend/src/lib/stores/chat.ts
Comment thread frontend/src/lib/utils/agentMessageHandler.ts
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (8)
bindu/utils/notifications.py (2)

131-138: ⚠️ Potential issue | 🟠 Major

Validate the URL scheme inside the destination helper too.

_post_once() can call _resolve_and_validate_destination() directly when resolved_ip is omitted, so non-HTTP schemes can bypass validate_config() and still reach urlopen(). Keep the scheme gate in this helper as the single defensive boundary.

Proposed fix
 `@staticmethod`
 def _resolve_and_validate_destination(url: str) -> str:
     parsed = urlparse(url)
+    if parsed.scheme not in {"http", "https"}:
+        raise ValueError("Push notification URL must use http or https scheme.")
+
     hostname = parsed.hostname
     if not hostname:
         raise ValueError("Push notification URL must include a valid hostname.")
#!/bin/bash
# Verify the helper contains its own scheme validation.
python - <<'PY'
from pathlib import Path

source = Path("bindu/utils/notifications.py").read_text()
start = source.index("    def _resolve_and_validate_destination")
end = source.index("\n    `@create_retry_decorator`", start)
body = source[start:end]

print(body)
if "parsed.scheme not in" not in body and "parsed.scheme" not in body:
    raise SystemExit("Missing scheme validation in _resolve_and_validate_destination")
PY
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/utils/notifications.py` around lines 131 - 138, The helper
_resolve_and_validate_destination currently only checks hostname/port but must
also validate URL scheme; add a guard in _resolve_and_validate_destination that
ensures parsed.scheme is one of the allowed HTTP schemes (e.g., "http" or
"https") and raise ValueError for other schemes so callers like _post_once
(which may call this helper directly) cannot bypass validate_config(); reference
parsed.scheme and parsed.hostname in the same function and return the validated
URL string after scheme check.

226-243: ⚠️ Potential issue | 🟠 Major

Avoid rewriting HTTPS URLs to the resolved IP.

For https:// webhooks, urllib verifies TLS against the hostname in target_url; the Host header does not preserve SNI/certificate validation. This will break normal HTTPS webhooks and still mishandles IPv6 netlocs and userinfo in parsed.netloc. Use a transport that connects to destination_ip while preserving the original hostname for HTTPS, and only build sanitized Host values from parsed.hostname/parsed.port.

#!/bin/bash
# Demonstrate the current URL rewrite changes the HTTPS URL host used by urllib.
python - <<'PY'
from urllib.parse import urlparse

parsed = urlparse("https://example.com:8443/webhook")
destination_ip = "203.0.113.10"
target_url = parsed._replace(netloc=f"{destination_ip}:{parsed.port}").geturl()

print("rewritten_url =", target_url)
print("urllib_url_hostname =", urlparse(target_url).hostname)
print("original_hostname =", parsed.hostname)

ipv6_parsed = urlparse("http://example.com:8080/webhook")
ipv6_destination = "2001:db8::10"
bad_netloc = f"{ipv6_destination}:{ipv6_parsed.port}"
print("unbracketed_ipv6_netloc =", bad_netloc)
PY
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/utils/notifications.py` around lines 226 - 243, The code rewrites
target_url to the resolved IP and sets Host from parsed.netloc which breaks TLS
SNI/cert validation and mishandles IPv6/userinfo; change the logic so that for
parsed.scheme == "https" you do NOT replace the hostname in target_url (keep the
original parsed.geturl()), and instead open the connection to destination_ip
using a transport that preserves server_hostname/SNI (e.g., use
http.client.HTTPSConnection(destination_ip, port=parsed.port, context=ssl_ctx,
server_hostname=parsed.hostname) or a urllib opener/handler that does the same)
while setting the Host header from parsed.hostname and parsed.port (not
parsed.netloc), ensuring IPv6 addresses are bracketed and any userinfo is
excluded; for plain http you may continue using destination_ip in the netloc but
still build Host from parsed.hostname/parsed.port.
bindu/server/endpoints/voice_endpoints.py (6)

801-803: ⚠️ Potential issue | 🟠 Major

Do not log agent transcript text.

This logs spoken agent output content, which can include sensitive user data echoed back by the agent. Log metadata only.

🛡️ Privacy-safe log
-        logger.info(
-            f"on_agent_transcript: got text='{text[:50]}...' is_final={is_final}"
-        )
+        logger.info(
+            "on_agent_transcript: got %d chars, is_final=%s",
+            len(text),
+            is_final,
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/server/endpoints/voice_endpoints.py` around lines 801 - 803, The
logger.info call in on_agent_transcript currently logs the agent's spoken text
(logger.info(f"on_agent_transcript: got text='{text[:50]}...'
is_final={is_final}")), which can expose sensitive content; change it to avoid
logging any transcript text and log only metadata (e.g., is_final, text length,
agent_id/session_id) instead—update the logger.info invocation in
on_agent_transcript (and any similar uses of logger.info that reference text or
text[:50]) to omit the text variable and include only non-sensitive fields like
is_final and len(text) or relevant IDs.

974-999: ⚠️ Potential issue | 🟠 Major

Log teardown failures instead of swallowing them.

Several teardown paths still use except Exception: pass, so failed state updates, bridge cleanup, session removal, or close calls leave no diagnostics.

🧹 Proposed teardown logging
         if websocket.client_state == WebSocketState.CONNECTED:
             try:
                 await _send_json(
                     websocket, {"type": "state", "state": "ended"}, send_lock
                 )
             except Exception:
-                pass
+                logger.exception("Failed to send ended state for voice session %s", session_id)
 
         try:
             await session_manager.update_state(session_id, "ending")
         except Exception:
-            pass
+            logger.exception("Failed to mark voice session %s as ending", session_id)
         try:
             if components is not None:
                 await components["bridge"].cleanup_background_tasks()
         except Exception:
-            pass
+            logger.exception("Failed to cleanup voice bridge for session %s", session_id)
         try:
             await session_manager.end_session(session_id)
         except Exception:
-            pass
+            logger.exception("Failed to end voice session %s", session_id)
         if websocket.client_state == WebSocketState.CONNECTED:
             try:
                 await websocket.close()
             except Exception:
-                pass
+                logger.exception("Failed to close voice websocket for session %s", session_id)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/server/endpoints/voice_endpoints.py` around lines 974 - 999, Replace
the bare "except Exception: pass" blocks in the teardown with logged error
handling so failures surface: wrap the calls to _send_json(websocket, ...),
session_manager.update_state(session_id, "ending"),
components["bridge"].cleanup_background_tasks(),
session_manager.end_session(session_id), and websocket.close() in try/except and
log the exception (including stack/exception details) instead of silently
passing — e.g., use logging.exception or your module logger to emit a
descriptive message for each failing operation (refer to websocket.client_state,
_send_json, session_manager.update_state,
components["bridge"].cleanup_background_tasks, session_manager.end_session,
websocket.close).

917-933: ⚠️ Potential issue | 🟠 Major

Wait for the control reader as well as the pipeline runner.

The session still awaits only runner_task; if _voice_control_reader() returns or fails first, stop/malformed-frame handling may not terminate the runner until timeout, and reader exceptions can be delayed until teardown.

🔧 Proposed task coordination
-        async with asyncio.timeout(float(app_settings.voice.session_timeout)):
-            await runner_task
+        async with asyncio.timeout(float(app_settings.voice.session_timeout)):
+            done, pending = await asyncio.wait(
+                {runner_task, reader_task},
+                return_when=asyncio.FIRST_COMPLETED,
+            )
+            for completed_task in done:
+                completed_task.result()
+            for pending_task in pending:
+                pending_task.cancel()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/server/endpoints/voice_endpoints.py` around lines 917 - 933, The code
currently only awaits runner_task; also await reader_task and ensure if either
task fails or returns first we cancel the other and propagate exceptions: after
creating runner_task and reader_task, use asyncio.wait({runner_task,
reader_task}, return_when=asyncio.FIRST_EXCEPTION) inside the asyncio.timeout,
then cancel any pending tasks (call .cancel()), await them (e.g., await
asyncio.gather(*pending, return_exceptions=True)) and re-raise any exception
from completed tasks so failures in _voice_control_reader or runner.run are
surfaced and the other task is cleaned up.

50-65: ⚠️ Potential issue | 🟠 Major

Move the remaining voice tuning constants into app_settings.

The Redis TTL, frame-size/fps/in-flight caps, and echo-suppression windows are still hardcoded. These are deployment knobs and should come from app_settings.voice instead of local constants.

As per coding guidelines, "Use bindu.settings.app_settings for all configuration - never hardcode values such as URLs, ports, timeouts, API keys, or feature flags" and "NEVER create local config constants - use app_settings from bindu.settings". Based on learnings, Applies to bindu/**/*.py : Use bindu.settings.app_settings for all configuration - never hardcode values such as URLs, ports, timeouts, API keys, or feature flags.

Also applies to: 421-423, 776-799

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/server/endpoints/voice_endpoints.py` around lines 50 - 65, Constants
for voice tuning (Redis TTL, frame size/fps/in-flight caps, echo-suppression
windows) are hardcoded in voice_endpoints.py (e.g., the _RATE_LIMIT_LUA block
and other local constants); move these into bindu.settings.app_settings.voice
and read them at runtime instead of local literals. Import app_settings from
bindu.settings, add/consume fields like app_settings.voice.redis_ttl,
.frame_size, .fps, .max_in_flight, .echo_suppression_window (or similar
descriptive names), replace the local constants and usages (including the
_RATE_LIMIT_LUA TTL/limit args and any frame/fps/in-flight variables) to
reference app_settings.voice values, and ensure defaults/validation are handled
where app_settings is loaded.

583-590: ⚠️ Potential issue | 🟠 Major

Validate context_id as a UUID before creating the session.

This still accepts any non-empty string even though the endpoint documents { "context_id": "<uuid>" }; malformed IDs will enter the voice/task layer.

🛡️ Proposed validation
-from uuid import uuid4
+from uuid import UUID, uuid4
...
         if isinstance(body, dict) and "context_id" in body:
             raw_context_id = body["context_id"]
             if not isinstance(raw_context_id, str) or not raw_context_id.strip():
                 raise HTTPException(
                     status_code=400,
                     detail="context_id must be a non-empty string",
                 )
-            context_id = raw_context_id.strip()
+            try:
+                context_id = str(UUID(raw_context_id.strip()))
+            except ValueError as exc:
+                raise HTTPException(
+                    status_code=400,
+                    detail="context_id must be a valid UUID",
+                ) from exc

As per coding guidelines, "Validate all external input and use type hints for input validation in Python files".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/server/endpoints/voice_endpoints.py` around lines 583 - 590, The
current check only ensures raw_context_id is a non-empty string but does not
verify it's a UUID; update the validation where raw_context_id/context_id is
handled (the block using raw_context_id and context_id in voice_endpoints.py) to
parse and validate it as a UUID (e.g., use uuid.UUID(raw_context_id, version=4)
or appropriate version) inside a try/except and raise
HTTPException(status_code=400, detail="context_id must be a valid UUID") on
failure; add the necessary import (import uuid) and ensure context_id is set to
the canonical string form (str(parsed_uuid).lower()) after successful parsing.

1-999: 🛠️ Refactor suggestion | 🟠 Major

Split this endpoint module before adding more voice logic.

This is still ~1,000 LOC and mixes rate limiting, REST handlers, WebSocket auth/control parsing, pipeline orchestration, serialization, and teardown. That makes lifecycle bugs harder to reason about and keeps violating the repository size rule.

As per coding guidelines, "Keep Python files under ~500 lines of code - extract modules when larger".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/server/endpoints/voice_endpoints.py` around lines 1 - 999, The file is
too large and should be split into focused modules: move rate-limit logic
(_RATE_LIMIT_LUA, _get_rate_limit_redis_client, _rate_limit_allow_ip,
_VOICE_RATE_LIMIT_LOCK, _VOICE_RATE_LIMIT_IP_BUCKET,
_VOICE_RATE_LIMIT_REDIS_LOCK, _VOICE_RATE_LIMIT_REDIS_CLIENT) into a new
voice_rate_limiter module; move helpers and small utils (_send_json,
_trim_overlap_text, _extract_bearer_token, _extract_ws_session_token,
_has_ws_subprotocols, _classify_voice_pipeline_error, _voice_preflight_error,
_send_error_and_close) into voice_utils; move WebSocket control/transport
wrappers and serializing classes (_VoiceControlState, _FilteredWebSocket,
_RawAudioFrameSerializer, _voice_control_reader) into voice_ws_helpers or
voice_transport; and keep REST handlers (voice_session_start, voice_session_end,
voice_session_status) in a smaller rest module while placing the main handler
voice_websocket and pipeline orchestration in voice_ws_handler; update imports
to reference the new modules (e.g., build_voice_pipeline usage remains in
voice_websocket) and re-export or adjust package imports so existing callers
(and handle_endpoint_errors decorator) import names unchanged. Ensure tests and
imports are updated and that cleanup/teardown calls
(components["bridge"].cleanup_background_tasks, session_manager methods) remain
accessible from the refactored locations.
🧹 Nitpick comments (1)
bindu/utils/notifications.py (1)

250-258: Chain the wrapped HTTPError to clarify exception origins.

The B904 warning is valid here. Preserving exc in the exception chain makes delivery failures distinguishable from failures while reading or decoding the error body.

Proposed fix
-            raise NotificationDeliveryError(status, message or f"HTTP error {status}")
+            raise NotificationDeliveryError(
+                status, message or f"HTTP error {status}"
+            ) from exc
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/utils/notifications.py` around lines 250 - 258, The except block
catching error.HTTPError currently raises NotificationDeliveryError without
chaining the original exc, losing the original traceback; update the raise in
that block to include the original exception as the cause (i.e., raise
NotificationDeliveryError(status, message or f"HTTP error {status}") from exc)
so the HTTPError is preserved in the exception chain and debugging remains
clear—look for the except error.HTTPError as exc handler and the
NotificationDeliveryError construction to apply this change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bindu/server/endpoints/voice_endpoints.py`:
- Around line 858-866: components is declared optional so referencing
components["bridge"] inside the nested _handle_user_text causes a type error;
after calling build_voice_pipeline() capture the non-optional bridge into a
local variable (e.g., bridge = components["bridge"] or the returned bridge
value) and close over that variable in _handle_user_text instead of using
components["bridge"]; update the callback to call
bridge.process_transcription(...) (and keep using _send_json and send_lock as
before) so the nested function only references the non-optional bridge.
- Around line 197-208: Replace the broad try/except import of
pipecat.serializers.base_serializer with a TYPE_CHECKING-aware pattern: import
FrameSerializer for typing only inside an if TYPE_CHECKING block and catch
ImportError (not Exception) for runtime; define a runtime no-op fallback class
named _FrameSerializerBase (instead of _PipecatFrameSerializer) and then have
_RawAudioFrameSerializer inherit from _FrameSerializerBase so type checkers see
the real FrameSerializer while runtime uses the fallback when pipecat is
unavailable.

In `@tests/unit/extensions/voice/test_voice_websocket_integration.py`:
- Around line 122-125: The test passes a hardcoded session token to
manager.create_session which triggers Ruff S106; suppress the linter for this
test-only literal by appending the correct pragma to that argument (e.g., change
the session_token line in the create_session call to include "# noqa: S106" or
the project’s configured Ruff noqa token) so Ruff ignores the hardcoded secret
in the test; target the session_token="token-abc" keyword in the
manager.create_session invocation.
- Around line 194-200: The timeout test can be affected by environment-driven
auth (VOICE__SESSION_AUTH_REQUIRED) causing the handler to close for missing
token instead of timing out; update the test to explicitly disable session auth
by setting module.app_settings.voice.session_auth_required = False (save the
original value alongside
original_timeout/original_enabled/original_stt/original_tts) before forcing
session_timeout and enabled, and restore the original session_auth_required in
the finally block to keep test isolation (use the same module.app_settings.voice
reference used in the diff).
- Around line 22-76: The test fixture mock_pipecat_modules currently only
registers leaf modules and causes _voice_preflight_error (called by
voice_websocket) to fail when importlib.import_module("pipecat") is attempted;
update mock_pipecat_modules to also create and register stub modules for the
top-level package and intermediate packages (pipecat, pipecat.transports,
pipecat.transports.websocket, and pipecat.pipeline) in sys.modules and set
appropriate attributes so importlib.import_module and attribute lookups succeed
before registering the existing leaf modules like
pipecat.transports.websocket.fastapi and pipecat.pipeline.pipeline.

---

Duplicate comments:
In `@bindu/server/endpoints/voice_endpoints.py`:
- Around line 801-803: The logger.info call in on_agent_transcript currently
logs the agent's spoken text (logger.info(f"on_agent_transcript: got
text='{text[:50]}...' is_final={is_final}")), which can expose sensitive
content; change it to avoid logging any transcript text and log only metadata
(e.g., is_final, text length, agent_id/session_id) instead—update the
logger.info invocation in on_agent_transcript (and any similar uses of
logger.info that reference text or text[:50]) to omit the text variable and
include only non-sensitive fields like is_final and len(text) or relevant IDs.
- Around line 974-999: Replace the bare "except Exception: pass" blocks in the
teardown with logged error handling so failures surface: wrap the calls to
_send_json(websocket, ...), session_manager.update_state(session_id, "ending"),
components["bridge"].cleanup_background_tasks(),
session_manager.end_session(session_id), and websocket.close() in try/except and
log the exception (including stack/exception details) instead of silently
passing — e.g., use logging.exception or your module logger to emit a
descriptive message for each failing operation (refer to websocket.client_state,
_send_json, session_manager.update_state,
components["bridge"].cleanup_background_tasks, session_manager.end_session,
websocket.close).
- Around line 917-933: The code currently only awaits runner_task; also await
reader_task and ensure if either task fails or returns first we cancel the other
and propagate exceptions: after creating runner_task and reader_task, use
asyncio.wait({runner_task, reader_task}, return_when=asyncio.FIRST_EXCEPTION)
inside the asyncio.timeout, then cancel any pending tasks (call .cancel()),
await them (e.g., await asyncio.gather(*pending, return_exceptions=True)) and
re-raise any exception from completed tasks so failures in _voice_control_reader
or runner.run are surfaced and the other task is cleaned up.
- Around line 50-65: Constants for voice tuning (Redis TTL, frame
size/fps/in-flight caps, echo-suppression windows) are hardcoded in
voice_endpoints.py (e.g., the _RATE_LIMIT_LUA block and other local constants);
move these into bindu.settings.app_settings.voice and read them at runtime
instead of local literals. Import app_settings from bindu.settings, add/consume
fields like app_settings.voice.redis_ttl, .frame_size, .fps, .max_in_flight,
.echo_suppression_window (or similar descriptive names), replace the local
constants and usages (including the _RATE_LIMIT_LUA TTL/limit args and any
frame/fps/in-flight variables) to reference app_settings.voice values, and
ensure defaults/validation are handled where app_settings is loaded.
- Around line 583-590: The current check only ensures raw_context_id is a
non-empty string but does not verify it's a UUID; update the validation where
raw_context_id/context_id is handled (the block using raw_context_id and
context_id in voice_endpoints.py) to parse and validate it as a UUID (e.g., use
uuid.UUID(raw_context_id, version=4) or appropriate version) inside a try/except
and raise HTTPException(status_code=400, detail="context_id must be a valid
UUID") on failure; add the necessary import (import uuid) and ensure context_id
is set to the canonical string form (str(parsed_uuid).lower()) after successful
parsing.
- Around line 1-999: The file is too large and should be split into focused
modules: move rate-limit logic (_RATE_LIMIT_LUA, _get_rate_limit_redis_client,
_rate_limit_allow_ip, _VOICE_RATE_LIMIT_LOCK, _VOICE_RATE_LIMIT_IP_BUCKET,
_VOICE_RATE_LIMIT_REDIS_LOCK, _VOICE_RATE_LIMIT_REDIS_CLIENT) into a new
voice_rate_limiter module; move helpers and small utils (_send_json,
_trim_overlap_text, _extract_bearer_token, _extract_ws_session_token,
_has_ws_subprotocols, _classify_voice_pipeline_error, _voice_preflight_error,
_send_error_and_close) into voice_utils; move WebSocket control/transport
wrappers and serializing classes (_VoiceControlState, _FilteredWebSocket,
_RawAudioFrameSerializer, _voice_control_reader) into voice_ws_helpers or
voice_transport; and keep REST handlers (voice_session_start, voice_session_end,
voice_session_status) in a smaller rest module while placing the main handler
voice_websocket and pipeline orchestration in voice_ws_handler; update imports
to reference the new modules (e.g., build_voice_pipeline usage remains in
voice_websocket) and re-export or adjust package imports so existing callers
(and handle_endpoint_errors decorator) import names unchanged. Ensure tests and
imports are updated and that cleanup/teardown calls
(components["bridge"].cleanup_background_tasks, session_manager methods) remain
accessible from the refactored locations.

In `@bindu/utils/notifications.py`:
- Around line 131-138: The helper _resolve_and_validate_destination currently
only checks hostname/port but must also validate URL scheme; add a guard in
_resolve_and_validate_destination that ensures parsed.scheme is one of the
allowed HTTP schemes (e.g., "http" or "https") and raise ValueError for other
schemes so callers like _post_once (which may call this helper directly) cannot
bypass validate_config(); reference parsed.scheme and parsed.hostname in the
same function and return the validated URL string after scheme check.
- Around line 226-243: The code rewrites target_url to the resolved IP and sets
Host from parsed.netloc which breaks TLS SNI/cert validation and mishandles
IPv6/userinfo; change the logic so that for parsed.scheme == "https" you do NOT
replace the hostname in target_url (keep the original parsed.geturl()), and
instead open the connection to destination_ip using a transport that preserves
server_hostname/SNI (e.g., use http.client.HTTPSConnection(destination_ip,
port=parsed.port, context=ssl_ctx, server_hostname=parsed.hostname) or a urllib
opener/handler that does the same) while setting the Host header from
parsed.hostname and parsed.port (not parsed.netloc), ensuring IPv6 addresses are
bracketed and any userinfo is excluded; for plain http you may continue using
destination_ip in the netloc but still build Host from
parsed.hostname/parsed.port.

---

Nitpick comments:
In `@bindu/utils/notifications.py`:
- Around line 250-258: The except block catching error.HTTPError currently
raises NotificationDeliveryError without chaining the original exc, losing the
original traceback; update the raise in that block to include the original
exception as the cause (i.e., raise NotificationDeliveryError(status, message or
f"HTTP error {status}") from exc) so the HTTPError is preserved in the exception
chain and debugging remains clear—look for the except error.HTTPError as exc
handler and the NotificationDeliveryError construction to apply this change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1e218909-b26a-44e9-93a8-65f301a33411

📥 Commits

Reviewing files that changed from the base of the PR and between 9c47a4d and 57703bb.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • .gitignore
  • bindu/grpc/service.py
  • bindu/server/endpoints/voice_endpoints.py
  • bindu/utils/notifications.py
  • tests/unit/extensions/voice/test_voice_websocket_integration.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • bindu/grpc/service.py
  • .gitignore

Comment thread bindu/server/endpoints/voice_endpoints.py
Comment thread bindu/server/endpoints/voice_endpoints.py
Comment thread bindu/server/endpoints/voice_endpoints.py Outdated
Comment thread tests/unit/extensions/voice/test_voice_websocket_integration.py
Comment thread tests/unit/extensions/voice/test_voice_websocket_integration.py
Comment thread tests/unit/extensions/voice/test_voice_websocket_integration.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (5)
bindu/server/endpoints/voice_endpoints.py (5)

981-993: ⚠️ Potential issue | 🟠 Major

Log teardown failures before suppressing them.

If update_state(), bridge cleanup, or end_session() fails, the session can remain active without diagnostics. This repeats an existing review concern.

Teardown logging
         try:
             await session_manager.update_state(session_id, "ending")
-        except Exception:
-            pass
+        except Exception:
+            logger.exception("Failed to mark voice session as ending: %s", session_id)
         try:
             if components is not None:
                 await components["bridge"].cleanup_background_tasks()
-        except Exception:
-            pass
+        except Exception:
+            logger.exception("Failed to cleanup voice bridge tasks: %s", session_id)
         try:
             await session_manager.end_session(session_id)
-        except Exception:
-            pass
+        except Exception:
+            logger.exception("Failed to end voice session: %s", session_id)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/server/endpoints/voice_endpoints.py` around lines 981 - 993, The
teardown currently swallows exceptions silently; change the three try/except
blocks around session_manager.update_state(session_id, "ending"),
components["bridge"].cleanup_background_tasks(), and
session_manager.end_session(session_id) to catch Exception as e and log the
error (including exception info/stack) before continuing so failures are
recorded for diagnostics; use the existing logger (or processLogger) to emit
descriptive messages that include the session_id and the caught exception for
each failing call.

916-932: ⚠️ Potential issue | 🟠 Major

Wait for the reader task as well as the runner.

_voice_control_reader() can return after malformed JSON, stop, or frame-limit violations while runner_task keeps running until session_timeout; completed reader exceptions also go unretrieved. This repeats an existing review concern.

Coordinate both tasks
-        async with asyncio.timeout(float(app_settings.voice.session_timeout)):
-            await runner_task
+        async with asyncio.timeout(float(app_settings.voice.session_timeout)):
+            done, pending = await asyncio.wait(
+                {runner_task, reader_task},
+                return_when=asyncio.FIRST_COMPLETED,
+            )
+            for completed in done:
+                completed.result()
+            for pending_task in pending:
+                pending_task.cancel()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/server/endpoints/voice_endpoints.py` around lines 916 - 932, The code
starts runner_task but does not wait for reader_task, leaving exceptions
unobserved and the runner may outlive the reader; update the section that
creates runner_task and reader_task so both tasks are awaited/coordinated:
create both runner_task (asyncio.create_task(runner.run(task))) and reader_task
(asyncio.create_task(_voice_control_reader(...))) and then use asyncio.wait or
asyncio.gather with proper timeout (the existing async with
asyncio.timeout(float(app_settings.voice.session_timeout))) to await both; on
timeout or completion ensure you cancel the remaining task(s), await them to
retrieve exceptions, and handle/log exceptions from _voice_control_reader and
runner.run to avoid unhandled task exceptions.

796-803: ⚠️ Potential issue | 🟠 Major

Do not log transcript content.

Line 802 logs the first 50 characters of agent speech; transcripts can contain sensitive user data echoed by the model. This repeats an existing review concern.

Privacy-safe log
-        logger.info(
-            f"on_agent_transcript: got text='{text[:50]}...' is_final={is_final}"
-        )
+        logger.info(
+            f"on_agent_transcript: got {len(text)} chars, is_final={is_final}"
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/server/endpoints/voice_endpoints.py` around lines 796 - 803, The
handler _on_agent_transcript logs transcript content via
logger.info(f"on_agent_transcript: got text='{text[:50]}...'
is_final={is_final}") which may leak sensitive user data; remove or redact the
transcript text (use only metadata) and keep the suppress_audio_until update
intact. Update the logger call in _on_agent_transcript to log only non-sensitive
fields (e.g., is_final and that a transcript was received) or a fixed redacted
placeholder instead of text, leaving control.suppress_audio_until logic
unchanged.

583-590: ⚠️ Potential issue | 🟠 Major

Validate context_id as a UUID, not just a non-empty string.

The docstring declares { "context_id": "<uuid>" }, but any non-empty string is accepted and persisted into the session layer. This repeats an existing review concern. As per coding guidelines, "Validate all external input and use type hints for input validation in Python files".

Validation sketch
-from uuid import uuid4
+from uuid import UUID, uuid4
...
         if isinstance(body, dict) and "context_id" in body:
             raw_context_id = body["context_id"]
             if not isinstance(raw_context_id, str) or not raw_context_id.strip():
                 raise HTTPException(
                     status_code=400,
                     detail="context_id must be a non-empty string",
                 )
-            context_id = raw_context_id.strip()
+            try:
+                context_id = str(UUID(raw_context_id.strip()))
+            except ValueError as exc:
+                raise HTTPException(
+                    status_code=400,
+                    detail="context_id must be a valid UUID",
+                ) from exc
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/server/endpoints/voice_endpoints.py` around lines 583 - 590, The
current check accepts any non-empty string for raw_context_id; change it to
validate that raw_context_id is a valid UUID: after stripping raw_context_id
(the variable in this block), attempt to parse it with uuid.UUID(...) and on
failure raise HTTPException(status_code=400, detail="context_id must be a valid
UUID"); if parsing succeeds assign the canonical string (or the UUID object,
depending on surrounding code expectations) back to context_id. Update the
imports to include the uuid module and keep the existing HTTPException usage and
variable names (raw_context_id, context_id) so the rest of the handler continues
to work.

600-617: ⚠️ Potential issue | 🟠 Major

Validate the WebSocket host before creating the session.

If host resolution fails, this returns 400 after create_session() succeeds, leaving an active session behind until cleanup/quota expiry. This repeats an existing review concern.

Ordering fix
-    try:
-        session = await session_manager.create_session(
-            context_id,
-            session_token=session_token,
-            session_token_expires_at=session_token_expires_at,
-        )
-    except RuntimeError as exc:
-        return JSONResponse({"error": str(exc)}, status_code=429)
-
     # Build WebSocket URL from request
     scheme = "wss" if request.url.scheme == "https" else "ws"
     # Use hostname from request, fallback to client host, or raise error if unavailable
     host = request.url.hostname or (request.client.host if request.client else None)
     if not host:
         return JSONResponse(
             {"error": "Unable to determine host for WebSocket URL"},
             status_code=400,
         )
+
+    try:
+        session = await session_manager.create_session(
+            context_id,
+            session_token=session_token,
+            session_token_expires_at=session_token_expires_at,
+        )
+    except RuntimeError as exc:
+        return JSONResponse({"error": str(exc)}, status_code=429)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/server/endpoints/voice_endpoints.py` around lines 600 - 617, Compute
and validate the WebSocket host (derive scheme and resolve host from
request.url.hostname or request.client.host) before calling
session_manager.create_session so you can return the 400 error early if host is
missing; move the host/scheme resolution and the "if not host" check above the
try/except that calls session_manager.create_session (referencing create_session
and session_manager) to avoid creating a session when the WebSocket host cannot
be determined.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@bindu/server/endpoints/voice_endpoints.py`:
- Around line 981-993: The teardown currently swallows exceptions silently;
change the three try/except blocks around
session_manager.update_state(session_id, "ending"),
components["bridge"].cleanup_background_tasks(), and
session_manager.end_session(session_id) to catch Exception as e and log the
error (including exception info/stack) before continuing so failures are
recorded for diagnostics; use the existing logger (or processLogger) to emit
descriptive messages that include the session_id and the caught exception for
each failing call.
- Around line 916-932: The code starts runner_task but does not wait for
reader_task, leaving exceptions unobserved and the runner may outlive the
reader; update the section that creates runner_task and reader_task so both
tasks are awaited/coordinated: create both runner_task
(asyncio.create_task(runner.run(task))) and reader_task
(asyncio.create_task(_voice_control_reader(...))) and then use asyncio.wait or
asyncio.gather with proper timeout (the existing async with
asyncio.timeout(float(app_settings.voice.session_timeout))) to await both; on
timeout or completion ensure you cancel the remaining task(s), await them to
retrieve exceptions, and handle/log exceptions from _voice_control_reader and
runner.run to avoid unhandled task exceptions.
- Around line 796-803: The handler _on_agent_transcript logs transcript content
via logger.info(f"on_agent_transcript: got text='{text[:50]}...'
is_final={is_final}") which may leak sensitive user data; remove or redact the
transcript text (use only metadata) and keep the suppress_audio_until update
intact. Update the logger call in _on_agent_transcript to log only non-sensitive
fields (e.g., is_final and that a transcript was received) or a fixed redacted
placeholder instead of text, leaving control.suppress_audio_until logic
unchanged.
- Around line 583-590: The current check accepts any non-empty string for
raw_context_id; change it to validate that raw_context_id is a valid UUID: after
stripping raw_context_id (the variable in this block), attempt to parse it with
uuid.UUID(...) and on failure raise HTTPException(status_code=400,
detail="context_id must be a valid UUID"); if parsing succeeds assign the
canonical string (or the UUID object, depending on surrounding code
expectations) back to context_id. Update the imports to include the uuid module
and keep the existing HTTPException usage and variable names (raw_context_id,
context_id) so the rest of the handler continues to work.
- Around line 600-617: Compute and validate the WebSocket host (derive scheme
and resolve host from request.url.hostname or request.client.host) before
calling session_manager.create_session so you can return the 400 error early if
host is missing; move the host/scheme resolution and the "if not host" check
above the try/except that calls session_manager.create_session (referencing
create_session and session_manager) to avoid creating a session when the
WebSocket host cannot be determined.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4e5aa629-099b-4560-8f8e-877ff71d4c53

📥 Commits

Reviewing files that changed from the base of the PR and between 57703bb and 2b5b1e6.

📒 Files selected for processing (1)
  • bindu/server/endpoints/voice_endpoints.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants