Skip to content

Feature/voice agent#386

Open
Co-vengers wants to merge 141 commits intoGetBindu:mainfrom
Co-vengers:feature/voice-agent
Open

Feature/voice agent#386
Co-vengers wants to merge 141 commits intoGetBindu:mainfrom
Co-vengers:feature/voice-agent

Conversation

@Co-vengers
Copy link
Copy Markdown
Contributor

@Co-vengers Co-vengers commented Mar 22, 2026

Voice Agent Extension — Progress & Documentation PR

Overview

This PR introduces the initial implementation of the Voice Agent Extension for Bindu, enabling real-time voice conversations between users and agents. The extension integrates backend, frontend, and testing components, following the architecture and plan outlined in docs/VOICE_AGENT_PLAN.md.


What’s Included

Backend

  • New voice extension module: bindu/extensions/voice/ with:
    • __init__.py, voice_agent_extension.py, service_factory.py, pipeline_builder.py, session_manager.py, agent_bridge.py, audio_config.py
  • Endpoints: bindu/server/endpoints/voice_endpoints.py (REST + WebSocket)
  • Settings: bindu/settings.py updated with VoiceSettings
  • App integration: bindu/server/applications.py updated for conditional voice route registration and session manager
  • Capabilities: bindu/utils/capabilities.py updated for voice extension helpers
  • Penguin integration: bindu/penguin/bindufy.py updated to accept voice config and add the extension

Frontend

  • Voice UI and client:
    • frontend/src/lib/services/voice-client.ts: WebSocket client, audio capture/playback
    • frontend/src/lib/stores/voice.ts: Svelte stores for voice state and transcripts
    • frontend/src/lib/components/voice/VoiceCallPanel.svelte, VoiceCallButton.svelte, LiveTranscript.svelte: UI components for voice session
  • Integration: Existing chat and agent message handler files updated for voice support

Tests

  • Unit tests for all major backend components:
    • tests/unit/extensions/voice/test_voice_extension.py
    • tests/unit/extensions/voice/test_session_manager.py
    • tests/unit/extensions/voice/test_service_factory.py
    • tests/unit/extensions/voice/test_agent_bridge.py
    • tests/unit/extensions/voice/test_voice_endpoints.py

Examples & Docs

  • Example agent: examples/voice-agent/main.py, .env.example, and README.md
  • Plan: docs/VOICE_AGENT_PLAN.md (implementation plan)

Current Progress

  • All major backend, frontend, and test files are present and staged.
  • Integration into the main app and settings is in progress.
  • Endpoints and frontend integration are actively being refined.
  • Unit tests for the extension and its components are included.
  • Example agent and configuration are provided.
  • Documentation plan is present; full user-facing docs (docs/VOICE.md) are planned.

How to Test

  1. Install dependencies:
    • Ensure pipecat-ai[deepgram,elevenlabs,silero] and websockets are installed (see pyproject.toml voice group).
  2. Set environment variables:
    • VOICE__STT_API_KEY, VOICE__TTS_API_KEY (see .env.example)
  3. Run backend tests:
    • uv run pytest tests/unit/extensions/voice/ -v
  4. Run frontend:
    • Start the Svelte frontend and verify the voice call UI appears for voice-enabled agents.
  5. Manual E2E:
    • Start a voice session from the UI, speak, and verify agent responses and transcripts.
  6. Check task persistence:
    • After a session, verify conversation history via GET /tasks/get.

Next Steps & Improvements

  • Complete and verify all items in the implementation plan checklist (see docs/VOICE_AGENT_PLAN.md)
  • Finalize and publish user documentation (docs/VOICE.md)
  • Polish frontend UI/UX and error handling
  • Expand test coverage (integration, E2E, edge cases)
  • Lint and format: uv run pre-commit run --all-files
  • Optimize session cleanup and resource management
  • Add more example agents and configuration scenarios
  • Prepare for future extensions (telephony, WebRTC, multi-language, etc.)

References


Contributors:


For questions or feedback, please comment on this PR.

Summary by CodeRabbit

  • New Features

    • Real-time voice conversations: WebSocket audio streaming with speech-to-text → agent → text-to-speech, REST session start/end/status, live transcripts, and agent audio playback
    • UI & client: voice-call button, voice call panel, live transcript component, and frontend voice client/integration
  • Documentation

    • Voice agent setup guide, example voice-agent project, and assorted README formatting/newline fixes
  • Tests

    • New unit tests for voice components, session management, service factories, and endpoints

@chandan-1427
Copy link
Copy Markdown
Contributor

Hey, thanks for working on adding voice support — really appreciate the effort here.

I went through the implementation and there are a few areas we’ll need to address before merging:

  • Multi-worker compatibility: The current session handling relies on a local store, which won’t work reliably with Uvicorn’s multi-worker setup. We’ll need to move this to a centralized solution (e.g., Redis) to avoid state inconsistencies.

  • Transport & latency: The current flow is based on HTTP requests. For voice interactions, we should aim for a real-time streaming approach (like WebSockets or SSE) to reduce latency and improve responsiveness.

  • Base branch alignment: It looks like this was built on an older version of Bindu. There are conflicts with recent changes, so rebasing onto the latest main would help before proceeding.

Looking forward to the update!

@Co-vengers Co-vengers force-pushed the feature/voice-agent branch from 6b8acb4 to 20e3419 Compare March 24, 2026 14:52
@Co-vengers
Copy link
Copy Markdown
Contributor Author

Hey

Hey, thanks for working on adding voice support — really appreciate the effort here.

I went through the implementation and there are a few areas we’ll need to address before merging:

  • Multi-worker compatibility: The current session handling relies on a local store, which won’t work reliably with Uvicorn’s multi-worker setup. We’ll need to move this to a centralized solution (e.g., Redis) to avoid state inconsistencies.
  • Transport & latency: The current flow is based on HTTP requests. For voice interactions, we should aim for a real-time streaming approach (like WebSockets or SSE) to reduce latency and improve responsiveness.
  • Base branch alignment: It looks like this was built on an older version of Bindu. There are conflicts with recent changes, so rebasing onto the latest main would help before proceeding.

Looking forward to the update!

Thanks for the review. I'll look into the improvements as suggested.

@Co-vengers
Copy link
Copy Markdown
Contributor Author

Hey @chandan-1427
I have implemented redis backed multi worker voice session management plus real time WebSocket streaming for low latency interactions, with config/lifecycle wiring and pre commit cleanup.

@chandan-1427
Copy link
Copy Markdown
Contributor

Hey @chandan-1427 I have implemented redis backed multi worker voice session management plus real time WebSocket streaming for low latency interactions, with config/lifecycle wiring and pre commit cleanup.

hey sorry for late response
Thanks for the massive amount of work you've put into this PR! The foundation here is fantastic - the backend architectural changes, Redis session integration, and Pipecat bridging are really well designed.

I’ve gone through the codebase and executed it locally. Before we can merge this, I found a few critical bugs that will cause crashes under load or in a distributed environment, plus a major gap in the frontend implementation.
Here is what needs to be updated:

Frontend / Implementation Gaps

Missing Real-Time Audio Streaming (VoiceCallPanel.svelte): The backend is beautifully set up to handle real-time binary PCM streams over WebSockets, but the frontend is currently faking it by using the old push-to-talk VoiceRecorder to hit the /api/transcribe REST endpoint. (This is also currently causing a 500 Internal Error locally due to a SvelteKit cookie bug). Fix: Please implement navigator.mediaDevices.getUserMedia to stream binary audio chunks directly to the WebSocket instead of using REST.

WebSocket Protocol Mismatch (voice-client.ts): The client blindly connects to the absolute wsUrl returned by the backend. If the backend is behind a reverse proxy, it will return localhost:3773, which breaks the browser connection. Fix: Dynamically construct the WebSocket URL using agentAPI.baseUrl.replace(/^http/, 'ws').
Backend / Stability Bugs

Distributed Clock Desync (session_manager.py): VoiceSession uses time.monotonic() for start_time. This is unique to a specific container's boot time, so if Worker A saves a session to Redis and Worker B reads it, the duration calculation breaks completely. Fix: Use time.time() (epoch time) so it safely serializes across distributed workers.

WebSocket Concurrent Write Crashes (voice_endpoints.py): _try_send_json fires messages asynchronously. If STT and TTS trigger simultaneously, Starlette WebSockets will throw a fatal ASGI RuntimeError for concurrent writes. Fix: Introduce an asyncio.Lock() to ensure outbound WebSocket messages are queued sequentially.

Context Window Exhaustion (agent_bridge.py): _conversation_history grows unbounded. In a 5-minute voice session, this will quickly blow out the LLM's max token limit and crash the agent. Fix: Implement a sliding window limit (e.g., keep the last 20 turns) or a summarization strategy.

Python Garbage Collection Task Killer (agent_bridge.py): In _safe_callback, you call asyncio.create_task(result) without keeping a strong reference. Python's GC will randomly destroy these tasks mid-execution, causing silent failures. Fix: Store tasks in a self._background_tasks = set() and remove them when done.

Backpressure Removal (memory_scheduler.py): Replacing await send() with send_nowait() removes queue backpressure. A minor traffic spike will now instantly throw 500 errors instead of pausing the incoming request while a worker frees up. Fix: Revert to await self._write_stream.send(task_op). If you want to prevent indefinite hangs, wrap it in an anyio.fail_after(5.0) block.

I haven't checked the tests just yet, but let's get these core architectural blockers sorted out first.
Really impressive work getting this architecture off the ground. Let me know when you've pushed these updates and we'll take another look!

Once again thanks for your time and work, really appreciate it! Looking forward to the update

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 31, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a voice agent extension with backend voice modules (audio config, STT/TTS factories, agent bridge, session managers), server REST/WebSocket voice endpoints, frontend voice client/components/stores, settings and dependency updates, protocol/type changes, tests, documentation, and small CI/readme formatting fixes.

Changes

Cohort / File(s) Summary
Voice Extension Core
bindu/extensions/voice/__init__.py, bindu/extensions/voice/voice_agent_extension.py, bindu/extensions/voice/audio_config.py
New voice extension package: extension class, audio constants, validators, package docs/exports.
Agent Bridge & Pipeline
bindu/extensions/voice/agent_bridge.py, bindu/extensions/voice/pipeline_builder.py, bindu/extensions/voice/service_factory.py
AgentBridgeProcessor implementation and pipeline builder; STT/TTS factory functions with provider checks and import-time errors.
Session Managers & Factory
bindu/extensions/voice/session_manager.py, bindu/extensions/voice/redis_session_manager.py, bindu/extensions/voice/session_factory.py
In-memory and Redis-backed session managers, TTL/cleanup loop, atomic Redis create script, async factory/closer. Review concurrency, Redis error handling, and Lua script usage.
Server Integration & Endpoints
bindu/server/endpoints/voice_endpoints.py, bindu/server/applications.py
App capability detection, lifespan wiring for session manager, REST session start/end/status, and WebSocket /ws/voice/{session_id} (audio streaming, STT/TTS, control messages). High surface area—review auth, state transitions, error codes, and async locking.
Frontend Voice Client & Stores
frontend/src/lib/services/voice-client.ts, frontend/src/lib/stores/voice.ts
Browser VoiceClient (HTTP + WebSocket + audio capture/streaming) and Svelte stores for session/state/transcripts/audio. Inspect browser API usage, AudioContext handling, and cleanup.
Frontend UI Components
frontend/src/lib/components/voice/LiveTranscript.svelte, frontend/src/lib/components/voice/VoiceCallButton.svelte, frontend/src/lib/components/voice/VoiceCallPanel.svelte
New Svelte components for live transcripts, call button, and voice panel with controls and playback.
Protocol & Types
bindu/common/protocol/types.py, frontend/src/lib/utils/agentMessageHandler.ts, frontend/src/lib/stores/chat.ts
FilePart/DataPart → TypedDicts with optional metadata; frontend agent message parts widened to include file parts; chat store sendMessage accepts parts array. Confirm backward compatibility of message shapes.
Settings, Dependencies & Examples
bindu/settings.py, pyproject.toml, examples/voice-agent/*, docs/VOICE_AGENT_PLAN.md
Adds VoiceSettings, new core deps and voice extra, voice example and docs. Review dependency version constraints and optional extras.
Tests & Test Stubs
tests/unit/extensions/voice/*, tests/conftest_stubs.py
Extensive new unit tests for bridge, factories, session managers, endpoints, helpers; expanded OpenTelemetry test stubs.
Frontend Integration & Misc.
frontend/src/lib/components/chat/*, frontend/src/lib/server/database.ts, frontend/src/lib/server/models.ts, frontend/.env.example, frontend/.gitignore
Chat input/window updated for file parts and voice flows; in-memory DB/mock adjustments; model metadata additions; env and gitignore tweaks.
Utilities, Messages & Minor Fixes
bindu/utils/capabilities.py, bindu/utils/__init__.py, bindu/utils/worker/messages.py, bindu/utils/worker/parts.py, bindu/utils/logging.py, bindu/utils/retry.py, bindu/server/scheduler/memory_scheduler.py, bindu/server/workers/base.py, bindu/server/metrics.py, bindu/penguin/bindufy.py
Capability extractor/re-export; file message parsing changes; typing/logging tweaks; enqueue timeout; span reconstruction helper; metrics cast; bindufy wiring for voice. Pay attention to message parsing and tracing restore logic.
Trivial Formatting / CI
README.*.md, .github/workflows/ci.yml, various example files
Trailing newline/readme fixes and CI pytest invocation changed to run pytest without explicit tests/unit/ path.

Sequence Diagram(s)

sequenceDiagram
    participant Frontend as Frontend (Client)
    participant REST as REST (Server)
    participant SessionMgr as Session Manager
    participant WS as WebSocket Handler
    participant STT as STT (Deepgram)
    participant Bridge as Agent Bridge
    participant Agent as Bindu Agent
    participant TTS as TTS (ElevenLabs)

    Frontend->>REST: POST /voice/session/start
    REST->>SessionMgr: create_session(context_id)
    SessionMgr-->>REST: session_id, ws_url
    REST-->>Frontend: 201 {session_id, ws_url}

    Frontend->>WS: open ws://.../ws/voice/{session_id}
    WS->>SessionMgr: get_session(session_id)
    SessionMgr-->>WS: VoiceSession
    WS->>Bridge: create AgentBridgeProcessor

    Frontend->>WS: binary PCM audio frames
    WS->>STT: transcribe frames
    STT-->>WS: transcript text
    WS->>Bridge: process_transcription(text)
    Bridge->>Agent: manifest_run(conversation_history)
    Agent-->>Bridge: response (text/dict/stream)
    Bridge-->>WS: response text
    WS->>TTS: synthesize audio
    TTS-->>WS: audio bytes
    WS-->>Frontend: audio frames and transcript updates

    Frontend->>WS: control messages (commit_turn / user_text / stop)
    WS->>SessionMgr: update_state / end_session
    WS-->>Frontend: close
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Suggested reviewers

  • Paraschamoli

Poem

🐰 I twitched my whiskers, tuned my ear,

From mic to manifest the messages steer,
Transcripts hop and answers bloom,
TTS hums soft across the room,
I nibble bugs — voice features cheer!

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
.github/workflows/ci.yml (1)

43-48: ⚠️ Potential issue | 🟠 Major

Unit test job is no longer scoped to unit tests

Line 43 now runs all discovered tests, which can pull integration/e2e suites into the unit-tests job and conflict with the job’s hermetic intent (and duplicate work already done in e2e-grpc-tests).

Proposed fix
-          uv run pytest \
+          uv run pytest tests/unit/ \
             --cov=bindu \
             --cov-report=term-missing \
             --cov-report=xml:coverage.xml \
             --cov-fail-under=60 \
             -v
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/ci.yml around lines 43 - 48, The CI step currently invokes
"uv run pytest" which discovers and runs all tests (risking inclusion of
integration/e2e suites); narrow this job to only unit tests by changing the
pytest invocation in the job to target the unit test marker or directory (for
example run pytest only on the unit tests directory like tests/unit or use
pytest -m "unit" or -k to exclude integration/e2e), keeping the existing
coverage flags (--cov=bindu, --cov-report, --cov-fail-under=60) intact so
coverage checks still apply to the unit-only run.
bindu/server/applications.py (1)

412-447: ⚠️ Potential issue | 🟠 Major

Make the voice session manager lifecycle exception-safe.

This cleanup only runs if startup reaches yield. If TaskManager(...) or its async with fails after the voice manager starts, the lifespan exits early and the cleanup loop/backend connection are left running. Put the voice-manager startup and teardown behind a try/finally so partial-startup failures close it too.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/server/applications.py` around lines 412 - 447, The voice session
manager is started but only torn down if the lifespan reaches yield; wrap the
creation and use of app._voice_session_manager in a try/finally so it is always
cleaned up on error. Specifically, after calling create_session_manager(...) and
await app._voice_session_manager.start_cleanup_loop(), ensure a finally block
checks if app._voice_session_manager is set and awaits
close_session_manager(app._voice_session_manager) (or await
app._voice_session_manager.stop_cleanup_task() where appropriate), and logs
shutdown; keep TaskManager logic inside the try so any exception triggers the
finally cleanup of the voice manager.
🟠 Major comments (21)
examples/typescript-langchain-agent/quiz-agent.ts-93-100 (1)

93-100: ⚠️ Potential issue | 🟠 Major

Select the latest valid user message, not just the last message.

Line 93 currently assumes the last array element is user input. If the tail message is assistant/system/tool, the model gets the wrong prompt.

Proposed fix
-      const last = messages[messages.length - 1];
-      let userInput = "";
-      if (last && typeof last.content === "string" && last.content.trim().length > 0) {
-        userInput = last.content;
-      } else {
-        // Optionally: return error or fallback
-        return "Error: Invalid or empty user input.";
-      }
+      const lastUserMessage = [...messages]
+        .reverse()
+        .find((m) => m?.role === "user" && typeof m.content === "string" && m.content.trim().length > 0);
+
+      if (!lastUserMessage || typeof lastUserMessage.content !== "string") {
+        return "Error: Invalid or empty user input.";
+      }
+      const userInput = lastUserMessage.content;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/typescript-langchain-agent/quiz-agent.ts` around lines 93 - 100, The
code currently picks messages[messages.length - 1] (variable last) which may be
system/assistant/tool; instead scan messages from the end to find the most
recent message with role === "user" and non-empty, and assign its content to
userInput (or return an error/fallback if none found). Replace the last-based
logic in the block that sets userInput (referencing the variables last and
userInput) with a reverse loop or Array.prototype.slice().reverse().find(...)
that filters on message.role === "user" and message.content.trim().length > 0,
then use that message's content.
examples/typescript-langchain-agent/quiz-agent.ts-109-109 (1)

109-109: ⚠️ Potential issue | 🟠 Major

Add explicit timeout and retry configuration to the ChatOpenAI initialization.

Line 20-27 initializes the LLM without timeout or retry guards. The ChatOpenAI constructor accepts timeout (milliseconds) and maxRetries options that should be explicitly configured to prevent indefinite hangs on failed or slow requests to OpenRouter:

Example fix
const llm = new ChatOpenAI({
  model: "openai/gpt-oss-120b",
  temperature: 0.3,
  timeout: 30000, // 30 seconds
  maxRetries: 2,
  configuration: {
    baseURL: "https://openrouter.ai/api/v1",
    apiKey: process.env.OPENROUTER_API_KEY,
  },
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/typescript-langchain-agent/quiz-agent.ts` at line 109, The
ChatOpenAI instance (the llm variable used later by llm.invoke) is created
without explicit timeout or retry settings, which can cause indefinite hangs;
update the ChatOpenAI constructor call to include timeout (milliseconds) and
maxRetries options (for example timeout: 30000 and maxRetries: 2) and keep the
existing configuration/apiKey fields so requests to OpenRouter bail out and
retry predictably.
tests/unit/server/scheduler/test_memory_scheduler.py-110-131 (1)

110-131: ⚠️ Potential issue | 🟠 Major

This test does not validate capacity/backpressure behavior yet.

test_send_operation_waits_for_capacity currently only verifies that send() is called. It does not assert waiting behavior or timeout behavior from anyio.fail_after(...). Please add assertions for blocked-send and timeout/unblock paths so this guards the actual regression risk.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/server/scheduler/test_memory_scheduler.py` around lines 110 - 131,
The test currently only checks that scheduler._write_stream.send was called;
instead simulate backpressure by making the fake_send await an external
synchronization primitive (e.g., anyio.Event or trio.Event) so the send call
blocks, then assert that scheduler.run_task does not complete until you set the
event (verifying it waits for capacity), and add a separate sub-test where you
wrap run_task in anyio.fail_after with a short timeout to assert it times
out/unblocks as expected; reference the test method
test_send_operation_waits_for_capacity, the InMemoryScheduler instance,
scheduler._write_stream.send (fake_send), and anyio.fail_after to locate where
to add the blocking/unblocking and timeout assertions.
bindu/server/scheduler/memory_scheduler.py-31-33 (1)

31-33: ⚠️ Potential issue | 🟠 Major

Move queue buffer size and enqueue timeout to app_settings.

_TASK_QUEUE_BUFFER_SIZE = 100 and anyio.fail_after(5.0) are hardcoded runtime config. Please source both values from bindu.settings.app_settings so they can be tuned per environment.

As per coding guidelines, "Use bindu.settings.app_settings for all configuration - never hardcode values such as URLs, ports, timeouts, API keys, or feature flags" and "NEVER create local config constants - use app_settings from bindu.settings."

Also applies to: 80-81

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/server/scheduler/memory_scheduler.py` around lines 31 - 33, Replace the
hardcoded _TASK_QUEUE_BUFFER_SIZE and the anyio.fail_after(5.0) timeout with
values sourced from bindu.settings.app_settings: import app_settings from
bindu.settings, use app_settings.task_queue_buffer_size (or similarly named
setting) instead of _TASK_QUEUE_BUFFER_SIZE and replace anyio.fail_after(5.0)
with anyio.fail_after(app_settings.task_enqueue_timeout); update both the queue
creation and the enqueue timeout usages (including the occurrences around lines
with anyio.fail_after and the buffer size) and add or document the corresponding
settings in app_settings if they do not yet exist.
frontend/src/lib/server/database.ts-206-211 (1)

206-211: ⚠️ Potential issue | 🟠 Major

aggregate() still makes the stats job a no-op in memory mode.

This overload now accepts the second argument used by computeStats(), but Lines 210-211 still discard the pipeline and return raw collection documents. That means the stats refresh path never applies its pipeline work when the in-memory backend is active, so conversationStats stays stale.

🛠️ Safe fallback if full pipeline support is not ready yet
  aggregate<R = T>(
  	_pipeline: Record<string, unknown>[],
  	_options?: Record<string, unknown>
  ): InMemoryCursor<R> {
-		// Simplified aggregation - just return all docs
-		return new InMemoryCursor(Array.from(this.data.values()) as unknown as R[]);
+		throw new Error("InMemoryCollection.aggregate() does not execute pipelines yet");
  }

At minimum, fail fast here; otherwise this path silently reports success while doing nothing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/lib/server/database.ts` around lines 206 - 211, The in-memory
aggregate method (aggregate in frontend/src/lib/server/database.ts) currently
ignores the pipeline/options and returns raw docs, causing
computeStats/conversationStats to be a no-op; update aggregate to either execute
the provided _pipeline/_options against the in-memory data or, if full pipeline
support isn’t implemented yet, fail fast by throwing a clear
UnsupportedOperationError (or similar) that includes the pipeline and options so
callers like computeStats know the operation failed rather than silently
returning raw documents. Locate the aggregate<R = T>(_pipeline, _options?)
implementation and replace the unconditional return with either a minimal
pipeline application over Array.from(this.data.values()) or an explicit throw
that surfaces the unsupported pipeline and options to the caller.
frontend/src/lib/server/database.ts-147-159 (1)

147-159: ⚠️ Potential issue | 🟠 Major

Pipeline-array updates are still applied as plain update docs.

Lines 152-158 feed each array stage through applyUpdate(), but applyUpdate() only shallow-merges $set values and iterates keys from $unset. With the new report migration, that turns contentId: "$assistantId" into a literal string write, and $unset: "assistantId" will not remove the assistantId field.

🛠️ Minimal direction
-				const pipelineUpdate = Array.isArray(update)
-					? update.reduce(
-							(acc, stage) => this.applyUpdate(acc, stage),
-							doc
-					  )
-					: this.applyUpdate(doc, update);
-				const updated = pipelineUpdate;
+				const updated = Array.isArray(update)
+					? update.reduce(
+							(acc, stage) => this.applyPipelineStage(acc, stage),
+							doc
+					  )
+					: this.applyUpdate(doc, update);

applyPipelineStage() should at least resolve "$field" lookups inside $set and treat string/array $unset values as field names.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/lib/server/database.ts` around lines 147 - 159, The
pipeline-array branch is incorrectly treating array stages as plain update docs
via applyUpdate, which causes "$field" lookups to be written literally and
$unset strings/arrays to be ignored; change the reducer to call a new or
existing applyPipelineStage (or extend applyUpdate) that for each stage resolves
"$field" lookup strings inside $set (replace values like "$assistantId" with the
corresponding value from the current document), and treats $unset entries that
are strings or arrays as field names to delete from the document; update the
reduction call in the update method (the Array.isArray(update) path) to use that
pipeline-aware handler so pipeline stages mutate fields and unset keys
correctly.
examples/voice-agent/README.md-11-14 (1)

11-14: ⚠️ Potential issue | 🟠 Major

Consolidate dependency installation to use uv for all setup steps.

The setup uses uv sync to create a managed environment, but then runs pip install "bindu[voice]" outside of that environment. This bare pip command can target a different Python interpreter or install the published package instead of the local development version. Later execution with uv run expects all dependencies to be in uv's managed environment, making the current approach unreliable.

Suggested change
-uv sync --dev --extra agents
-pip install "bindu[voice]"
+uv sync --dev --extra agents --extra voice
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/voice-agent/README.md` around lines 11 - 14, Replace the bare pip
install with a uv-run installation so dependencies are installed into uv's
managed environment: change the sequence that runs uv sync --dev --extra agents
and pip install "bindu[voice]" to use uv run -- pip install "bindu[voice]" (or
uv run -- python -m pip install "bindu[voice]") so the install of the local
development package happens inside the environment created by uv sync; ensure
subsequent instructions that reference uv run assume dependencies were installed
via uv run.
frontend/src/lib/components/chat/ChatInput.svelte-36-36 (1)

36-36: ⚠️ Potential issue | 🟠 Major

Await onsubmit before clearing the composer.

The onsubmit callback is typed as synchronous but implementations are async. Calling it without await clears value and files immediately, before the async send completes—losing user input on send failure.

Additionally, the file input element value is reset only in the error path (line 182), not after successful submission. This prevents users from re-selecting the same file.

💡 Suggested changes
-		onsubmit?: (message: string, fileParts: Array<{ name: string; mime: string; value: string }>) => void;
+		onsubmit?: (
+			message: string,
+			fileParts: Array<{ name: string; mime: string; value: string }>
+		) => void | Promise<void>;
 			try {
 				const fileParts = await getFileParts();
 				if (typeof onsubmit === "function") {
-					onsubmit(value, fileParts);
+					await onsubmit(value, fileParts);
 				}
 				// Clear files and input after successful submit
 				files = [];
 				value = "";
+				if (fileInputEl) fileInputEl.value = "";
 			} catch (err) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/lib/components/chat/ChatInput.svelte` at line 36, The onsubmit
callback is treated as sync but used with async implementations, causing value
and files to be cleared before the send completes and losing input on failure;
change the ChatInput.svelte handler to await the onsubmit call (await
onsubmit(...)) and only clear value and files after the awaited call completes
(and in the catch path restore as needed), and always reset the file input
element (the DOM file input reference) after successful submission as well as on
error so users can re-select the same file; update the type of onsubmit to
return Promise<void> to reflect async usage.
examples/document-analyzer/document_analyzer.py-117-123 (1)

117-123: ⚠️ Potential issue | 🟠 Major

Don't send the uploaded document under both User Prompt and Document Content.

In the chat-format path, content is appended to prompt_parts and also copied into extracted_docs when it contains the document marker. That duplicates the full document in the model input and can also leak prompt text into Document Content when both live in the same message.

Suggested fix
-        if isinstance(content, str) and content.strip():
-            prompt_parts.append(content)
-            if "--- Document Uploaded" in content:
-                extracted_docs.append(content)
+        if isinstance(content, str) and content.strip():
+            docs = DOCUMENT_BLOCK_RE.findall(content)
+            if docs:
+                extracted_docs.extend(docs)
+                content = DOCUMENT_BLOCK_RE.sub("", content).strip()
+            if content:
+                prompt_parts.append(content)
             continue

Add a compiled DOCUMENT_BLOCK_RE near the top of the module so you can separate prompt text from embedded document blocks.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/document-analyzer/document_analyzer.py` around lines 117 - 123, The
chat-format handler appends the entire message content to prompt_parts and also
copies it into extracted_docs when it contains the document marker, causing
duplication and possible leakage of prompt text into document content; introduce
a compiled DOCUMENT_BLOCK_RE at the top of the module and use it in the loop to
split msg["content"] into document blocks and remaining prompt text, append only
the non-document remainder to prompt_parts and append only the matched document
blocks to extracted_docs (do not copy the full content variable into
extracted_docs), updating the logic around variables content, prompt_parts, and
extracted_docs accordingly.
bindu/extensions/voice/audio_config.py-14-17 (1)

14-17: ⚠️ Potential issue | 🟠 Major

FRAME_SIZE should be derived from the active sample rate, not always 16kHz.

VoiceAgentExtension allows non-default rates, but this constant is frozen from DEFAULT_SAMPLE_RATE. Any caller that uses it will chunk 24kHz/48kHz sessions with 16kHz-sized frames, which breaks streaming timing and VAD alignment.

Suggested fix
-FRAME_SIZE: int = (
-    DEFAULT_SAMPLE_RATE * FRAME_DURATION_MS // 1000 * BYTES_PER_SAMPLE
-)  # 640 bytes
+def get_frame_size(
+    sample_rate: int,
+    channels: int = DEFAULT_CHANNELS,
+    bytes_per_sample: int = BYTES_PER_SAMPLE,
+) -> int:
+    return sample_rate * FRAME_DURATION_MS // 1000 * channels * bytes_per_sample
+
+FRAME_SIZE: int = get_frame_size(DEFAULT_SAMPLE_RATE)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/extensions/voice/audio_config.py` around lines 14 - 17, The FRAME_SIZE
constant is incorrectly calculated from DEFAULT_SAMPLE_RATE, causing wrong
chunking for non-16k sessions; change the design so FRAME_SIZE is computed from
the active sample rate (e.g., add a helper function like
compute_frame_size(sample_rate: int) or make FRAME_SIZE a property on the audio
config/VoiceAgentExtension instance) using FRAME_DURATION_MS and
BYTES_PER_SAMPLE instead of DEFAULT_SAMPLE_RATE, and update all callers to use
that computed value (refer to FRAME_SIZE, FRAME_DURATION_MS,
DEFAULT_SAMPLE_RATE, BYTES_PER_SAMPLE and VoiceAgentExtension to find where to
replace the constant with the new computed value).
bindu/extensions/voice/redis_session_manager.py-64-71 (1)

64-71: ⚠️ Potential issue | 🟠 Major

Don't log Redis URLs verbatim.

Connection strings often carry passwords, and both the info log and the raised ConnectionError currently include the full URL.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/extensions/voice/redis_session_manager.py` around lines 64 - 71, The
code logs and raises errors containing the full Redis URL (including secrets);
update the block around the ping call (where self._redis_client.ping() is
invoked and logger.info / ConnectionError are used) to redact sensitive parts of
self.redis_url before output: parse self.redis_url (e.g.,
urllib.parse.urlparse), remove or replace the password/userinfo, and reconstruct
a safe string (or log only hostname:port) and use that redacted value in
logger.info and the raised ConnectionError message instead of the raw URL.
bindu/extensions/voice/agent_bridge.py-89-103 (1)

89-103: ⚠️ Potential issue | 🟠 Major

Avoid logging raw transcript content.

These debug lines persist user utterances and agent replies into application logs, which turns every voice session into log-retained conversational data.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/extensions/voice/agent_bridge.py` around lines 89 - 103, The debug
calls in agent_bridge.py currently log raw user transcript and agent replies
(the logger.debug calls referencing self._context_id, text and response_text)
which must be removed or redacted; replace them with non-sensitive metadata-only
logs (e.g., indicate message direction, context id, and length or a hash) and
ensure you do not print or persist the full text before or after calling
_invoke_agent; keep the conversation storage behavior
(_conversation_history.append and _trim_history) intact but stop emitting raw
content to logs.
bindu/extensions/voice/agent_bridge.py-159-163 (1)

159-163: ⚠️ Potential issue | 🟠 Major

Trim whole turns, not individual messages.

When the history is full and a new user utterance arrives, this can delete just the oldest user message and leave the next manifest.run(...) starting with an orphaned assistant response.

🧠 Suggested fix
     def _trim_history(self) -> None:
         """Keep only the most recent conversation turns."""
         overflow = len(self._conversation_history) - self._max_history_messages
         if overflow > 0:
-            del self._conversation_history[:overflow]
+            turns_to_drop = max(1, (overflow + 1) // 2)
+            del self._conversation_history[: turns_to_drop * 2]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/extensions/voice/agent_bridge.py` around lines 159 - 163,
_trimp_history currently deletes individual messages which can leave an
assistant message orphaned; change it to trim whole conversation turns instead.
In the _trim_history method use _conversation_history and _max_history_messages
to remove messages in pairs (user+assistant) from the start until
len(_conversation_history) <= _max_history_messages, or if messages are
role-tagged, repeatedly pop from the start until the first remaining entry is a
user turn and the total length is within _max_history_messages; ensure the
truncation logic prevents leaving an assistant response at the start (and update
any related assumptions in manifest.run if it expects history to start with a
user).
frontend/src/lib/components/chat/ChatWindow.svelte-166-178 (1)

166-178: ⚠️ Potential issue | 🟠 Major

Don't clear the composer on failed sends.

The finally block wipes draft and files even when sendAgentMessage(...) throws, so the user loses the message and attachments they need to retry.

💬 Suggested fix
         try {
             for await (const _update of sendAgentMessage(message, contextId, undefined, undefined, undefined, undefined, fileParts)) {
                 // Process updates if needed
             }
+            draft = "";
+            files = [];
         } catch (err) {
             console.error("Error sending agent message:", err);
-            // Optionally surface error to user
-            alert(`Error: ${err instanceof Error ? err.message : String(err)}`);
-        } finally {
-            // Always clear draft and files
-            draft = "";
-            files = [];
+            $error = err instanceof Error ? err.message : String(err);
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/lib/components/chat/ChatWindow.svelte` around lines 166 - 178,
The finally block currently clears draft and files even when
sendAgentMessage(message, contextId, ...) fails; change this so draft and files
are only cleared on successful send: remove the clearing from the finally block
and instead clear draft and files after the sendAgentMessage iteration completes
without throwing (e.g., after the for-await loop or inside a success branch),
leaving the catch block to surface the error without wiping the composer so
users can retry.
bindu/extensions/voice/voice_agent_extension.py-28-40 (1)

28-40: 🛠️ Refactor suggestion | 🟠 Major

Use app_settings.voice as the single source of defaults.

These hardcoded constructor defaults duplicate values already defined in VoiceSettings, so the extension metadata can drift from runtime config whenever only part of the voice config is supplied.

As per coding guidelines "Use bindu.settings.app_settings for all configuration - never hardcode values such as URLs, ports, timeouts, API keys, or feature flags" and "NEVER create local config constants - use app_settings from bindu.settings".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/extensions/voice/voice_agent_extension.py` around lines 28 - 40, The
__init__ of VoiceAgentExtension currently hardcodes defaults (stt_provider,
stt_model, stt_language, tts_provider, tts_voice_id, tts_model, sample_rate,
allow_interruptions, vad_enabled, description); change the signature so these
parameters are optional (no hardcoded defaults) and inside
VoiceAgentExtension.__init__ read fallback values from app_settings.voice (e.g.,
app_settings.voice.stt_provider, .stt_model, .stt_language, .tts_provider,
.tts_voice_id, .tts_model, .sample_rate, .allow_interruptions, .vad_enabled,
.description) assigning them when the incoming arg is None, so the extension
uses app_settings.voice as the single source of truth and avoids duplicated
local defaults.
bindu/settings.py-1038-1070 (1)

1038-1070: ⚠️ Potential issue | 🟠 Major

Validate the env-backed voice settings at startup.

These values come straight from environment/config, but invalid inputs like sample_rate <= 0, audio_channels < 1, vad_threshold > 1, or non-positive session limits will currently slip through until the voice pipeline is already starting.

🛡️ Suggested validation
 class VoiceSettings(BaseSettings):
@@
-    sample_rate: int = 16000
-    audio_channels: int = 1
+    sample_rate: int = Field(default=16000, gt=0)
+    audio_channels: int = Field(default=1, gt=0)
@@
-    vad_threshold: float = 0.5
+    vad_threshold: float = Field(default=0.5, ge=0.0, le=1.0)
@@
-    session_timeout: int = 300  # seconds (5 min)
-    max_concurrent_sessions: int = 10
+    session_timeout: int = Field(default=300, gt=0)  # seconds (5 min)
+    max_concurrent_sessions: int = Field(default=10, gt=0)
@@
-    redis_session_ttl: int = 300  # seconds, TTL for session keys in Redis
+    redis_session_ttl: int = Field(default=300, gt=0)  # seconds

As per coding guidelines "Validate all external input and use type hints for input validation in Python files".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/settings.py` around lines 1038 - 1070, Add explicit startup validation
for the voice/session settings defined in settings.py: implement a validator
function (e.g., validate_voice_settings or Pydantic field validators on the
Settings model) that checks sample_rate > 0, audio_channels >= 1, audio_encoding
is one of the expected encodings, 0.0 <= vad_threshold <= 1.0, session_timeout >
0, max_concurrent_sessions > 0, session_backend is in {"memory","redis"} and if
session_backend == "redis" then redis_url is non-empty and well-formed; raise
clear ValueError/ConfigError with descriptive messages on invalid values and
ensure this validator runs at startup (call it from the module init or
Settings.__post_init__/model validation) so misconfiguration fails fast.
tests/unit/extensions/voice/test_voice_endpoints.py-216-226 (1)

216-226: ⚠️ Potential issue | 🟠 Major

This lock regression test doesn't prove serialization yet.

await_count == 2 passes whether the sends were serialized or concurrent, so the ASGI concurrent-send bug can still slip through.

🧪 Stronger serialization check
     websocket = AsyncMock()
     send_lock = asyncio.Lock()
+    call_order: list[str] = []
+    first_send_entered = asyncio.Event()
+    release_first_send = asyncio.Event()
+
+    async def send_text(payload: str) -> None:
+        call_order.append(payload)
+        if len(call_order) == 1:
+            first_send_entered.set()
+            await release_first_send.wait()
+
+    websocket.send_text.side_effect = send_text
 
-    await asyncio.gather(
-        _send_json(websocket, {"type": "one"}, send_lock),
-        _send_json(websocket, {"type": "two"}, send_lock),
-    )
+    first = asyncio.create_task(_send_json(websocket, {"type": "one"}, send_lock))
+    await first_send_entered.wait()
+    second = asyncio.create_task(_send_json(websocket, {"type": "two"}, send_lock))
+    await asyncio.sleep(0)
+    assert len(call_order) == 1
+    release_first_send.set()
+    await asyncio.gather(first, second)
 
     assert websocket.send_text.await_count == 2
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/extensions/voice/test_voice_endpoints.py` around lines 216 - 226,
The test test_send_json_uses_lock_when_provided currently only asserts
websocket.send_text.await_count == 2, which doesn't prove serialization; modify
the test to instrument the AsyncMock send_text to record start and end
timestamps or use asyncio.Events to force one send to delay and assert the
second send does not start until the first completes when calling _send_json
with send_lock. Specifically, wrap websocket.send_text (used by _send_json) with
a coroutine that logs start/end (or awaits an Event) so you can assert ordering
(end of first <= start of second) or that the second awaited the Event released
by the first, proving serialization under send_lock.
frontend/src/lib/stores/voice.ts-33-57 (1)

33-57: ⚠️ Potential issue | 🟠 Major

Ignore late events from stale VoiceClient instances.

These callbacks always mutate the singleton stores, even after restart/shutdown. Because stopSession() errors are swallowed, a half-closed client can still push transcripts, audio, or error state into the next session. Gate each callback on the currently active instance, or clear the old handlers before stopping it.

Minimal guard
-  client = new VoiceClient();
-  client.onTranscript = appendTranscript;
-  client.onStateChange = (state) => {
-    voiceState.set(state);
-  };
-  client.onAgentAudio = (audioData) => {
-    latestAgentAudio.set(audioData);
-  };
-  client.onError = (message) => {
-    voiceError.set(message);
-    voiceState.set('error');
-  };
+  const nextClient = new VoiceClient();
+  client = nextClient;
+  nextClient.onTranscript = (event) => {
+    if (client !== nextClient) return;
+    appendTranscript(event);
+  };
+  nextClient.onStateChange = (state) => {
+    if (client !== nextClient) return;
+    voiceState.set(state);
+  };
+  nextClient.onAgentAudio = (audioData) => {
+    if (client !== nextClient) return;
+    latestAgentAudio.set(audioData);
+  };
+  nextClient.onError = (message) => {
+    if (client !== nextClient) return;
+    voiceError.set(message);
+    voiceState.set('error');
+  };

Also applies to: 78-89

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/lib/stores/voice.ts` around lines 33 - 57, The callbacks from
stale VoiceClient instances are still mutating singleton stores after restart;
fix by ensuring handlers are only applied for the active instance and cleared on
the old one: before awaiting existingClient.stopSession(), remove or noop its
handlers (e.g., set existingClient.onTranscript = existingClient.onStateChange =
existingClient.onAgentAudio = existingClient.onError = null) and when creating a
new VoiceClient capture a local identifier (const thisClient = new
VoiceClient(); client = thisClient) and wrap each handler
(onTranscript/onStateChange/onAgentAudio/onError) to first check client ===
thisClient (or that the handler is still the active one) before calling
voiceState.set / latestAgentAudio.set / voiceError.set to prevent stale
instances from mutating stores.
bindu/server/endpoints/voice_endpoints.py-55-62 (1)

55-62: ⚠️ Potential issue | 🟠 Major

Differentiate an empty body from an invalid start payload.

This broad except Exception: pass turns malformed JSON into a fresh session and forwards any context_id string unchecked. Empty body is fine, but malformed JSON or an invalid context_id should return 400 before create_session(). As per coding guidelines, "Validate all external input and use type hints for input validation in Python files".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/server/endpoints/voice_endpoints.py` around lines 55 - 62, The handler
currently swallows all exceptions when parsing request.json() and accepts any
context_id string; change this to explicitly treat an empty body as acceptable
but return HTTP 400 for malformed JSON or invalid context_id: call await
request.json() inside a try that only catches JSONDecodeError (or
starlette.requests.BadRequest) and return 400 on parse errors, then if the
parsed body is a dict and contains "context_id" validate it with uuid.UUID(...)
(or an explicit UUID validator) and return 400 if invalid; only after successful
validation proceed to set context_id and call create_session(). Ensure the
handler signature and validation use type hints for the request/body where
applicable.
bindu/server/endpoints/voice_endpoints.py-69-81 (1)

69-81: ⚠️ Potential issue | 🟠 Major

Don’t return a proxy-fragile absolute ws_url.

Building the socket URL from request.url still loses deployment details like root_path and proxy rewriting, so clients behind a reverse proxy can connect to the wrong endpoint. Returning a relative websocket path (or just session_id) lets the client derive the final ws: URL from its configured API base.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/server/endpoints/voice_endpoints.py` around lines 69 - 81, The endpoint
currently constructs an absolute WebSocket URL using request.url
(scheme/hostname/port) which breaks behind proxies and ignores root_path;
instead return a proxy-safe value such as the relative WebSocket path or just
the session identifier so the client can derive the final ws:// URL from its
configured API base. Modify the code that builds ws_url (remove usage of
request.url.scheme, request.url.hostname, request.url.port and host fallback
logic) and return either f"/ws/voice/{session.id}" or simply session.id from the
handler so callers can compose the full WebSocket URL correctly; reference the
existing session.id and the current ws_url construction code to locate and
replace the logic.
bindu/server/endpoints/voice_endpoints.py-548-557 (1)

548-557: ⚠️ Potential issue | 🟠 Major

Track and cancel pending send tasks in the finally block.

loop.create_task() creates fire-and-forget tasks. If _send_json tries to send after the WebSocket closes (in the finally block), exceptions are silently dropped since they occur in detached background tasks. Store created tasks in a set and cancel them during cleanup:

pending_sends = set()

# In _try_send_json:
task = loop.create_task(_send_json(websocket, data, send_lock))
pending_sends.add(task)
task.add_done_callback(pending_sends.discard)

# In finally block:
for task in pending_sends:
    task.cancel()
await asyncio.gather(*pending_sends, return_exceptions=True)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/server/endpoints/voice_endpoints.py` around lines 548 - 557, The
current _try_send_json creates fire-and-forget tasks with
loop.create_task(_send_json(...)) which can leak and swallow exceptions;
introduce a module-level or connection-scoped set named pending_sends to track
created asyncio.Tasks (add the task and attach
task.add_done_callback(pending_sends.discard) inside _try_send_json) and in the
corresponding cleanup/finally block cancel all tasks from pending_sends and
await them (using asyncio.gather with return_exceptions=True) so pending
_send_json tasks are cancelled and awaited before teardown.
🟡 Minor comments (8)
examples/typescript-langchain-agent/quiz-agent.ts-115-117 (1)

115-117: ⚠️ Potential issue | 🟡 Minor

Don’t return raw provider error messages to end users.

Returning error.message can leak internal/provider details and makes failures look like normal successful text responses.

Proposed fix
-    } catch (error: any) {
-      return `Error: ${error.message}`;
+    } catch (error: unknown) {
+      // Log internally with your logger if available.
+      return "Error: Failed to generate quiz. Please try again.";
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/typescript-langchain-agent/quiz-agent.ts` around lines 115 - 117,
The catch block currently returns raw provider error text from "catch (error:
any)" (in quiz-agent.ts) which can leak internals; instead, log the full error
to your internal logger (or console.error) referencing the catch block, then
return a generic, user-facing message such as "An unexpected error occurred;
please try again later." Alternatively rethrow a sanitized Error for centralized
handling—do not include error.message in the response body.
examples/typescript-langchain-agent/quiz-agent.ts-20-26 (1)

20-26: ⚠️ Potential issue | 🟡 Minor

Move apiKey to top-level for consistency with LangChain best practices.

apiKey should be placed at the top level of the ChatOpenAI constructor options, not nested under configuration. While both approaches work in @langchain/openai v0.4.0, the official LangChain documentation recommends the top-level pattern.

Proposed adjustment
 const llm = new ChatOpenAI({
   model: "openai/gpt-oss-120b", // same as your Python version
   temperature: 0.3,
+  apiKey: process.env.OPENROUTER_API_KEY,
   configuration: {
     baseURL: "https://openrouter.ai/api/v1",
-    apiKey: process.env.OPENROUTER_API_KEY,
   },
 });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/typescript-langchain-agent/quiz-agent.ts` around lines 20 - 26, The
ChatOpenAI instantiation places apiKey inside the nested configuration object;
move apiKey to the top-level options to follow LangChain best practices — update
the ChatOpenAI call (symbol: ChatOpenAI) so apiKey is a direct property
alongside model and temperature (remove it from configuration), leaving
configuration for other HTTP options like baseURL only.
examples/voice-agent/.env.example-6-14 (1)

6-14: ⚠️ Potential issue | 🟡 Minor

Reorder optional env keys to satisfy dotenv-linter.
dotenv-linter reports UnorderedKey warnings in this block; reordering avoids noisy lint output.

🔧 Suggested reorder
 # Optional tuning
-VOICE__STT_MODEL=nova-3
-VOICE__STT_LANGUAGE=en
-VOICE__TTS_VOICE_ID=21m00Tcm4TlvDq8ikWAM
-VOICE__TTS_MODEL=eleven_turbo_v2_5
-VOICE__SAMPLE_RATE=16000
-VOICE__AUDIO_CHANNELS=1
-VOICE__AUDIO_ENCODING=linear16
-VOICE__ALLOW_INTERRUPTIONS=true
-VOICE__SESSION_TIMEOUT=300
+VOICE__ALLOW_INTERRUPTIONS=true
+VOICE__AUDIO_CHANNELS=1
+VOICE__AUDIO_ENCODING=linear16
+VOICE__SAMPLE_RATE=16000
+VOICE__SESSION_TIMEOUT=300
+VOICE__STT_LANGUAGE=en
+VOICE__STT_MODEL=nova-3
+VOICE__TTS_MODEL=eleven_turbo_v2_5
+VOICE__TTS_VOICE_ID=21m00Tcm4TlvDq8ikWAM
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/voice-agent/.env.example` around lines 6 - 14, Reorder the block of
environment variable keys so they are sorted to satisfy dotenv-linter (remove
UnorderedKey warnings); specifically sort the keys such as
VOICE__AUDIO_CHANNELS, VOICE__AUDIO_ENCODING, VOICE__ALLOW_INTERRUPTIONS,
VOICE__SAMPLE_RATE, VOICE__SESSION_TIMEOUT, VOICE__STT_LANGUAGE,
VOICE__STT_MODEL, VOICE__TTS_MODEL, and VOICE__TTS_VOICE_ID into alphabetical
order (or the linter-expected order) while keeping their values unchanged so the
variables (VOICE__STT_MODEL, VOICE__STT_LANGUAGE, VOICE__TTS_VOICE_ID,
VOICE__TTS_MODEL, VOICE__SAMPLE_RATE, VOICE__AUDIO_CHANNELS,
VOICE__AUDIO_ENCODING, VOICE__ALLOW_INTERRUPTIONS, VOICE__SESSION_TIMEOUT) are
ordered consistently.
frontend/src/lib/components/chat/ChatInput.svelte-174-176 (1)

174-176: ⚠️ Potential issue | 🟡 Minor

Reset the native file input after a successful submit.

Only the Svelte state is cleared here. The hidden <input type="file"> keeps its previous value, so selecting the same file again will not fire change in most browsers. The error path already correctly resets it, but the success path should too for consistency.

💡 Suggested change
 				// Clear files and input after successful submit
+				if (fileInputEl) fileInputEl.value = "";
 				files = [];
 				value = "";
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/lib/components/chat/ChatInput.svelte` around lines 174 - 176,
The success path clears Svelte state (files = [] and value = "") but doesn't
reset the native file input element, so selecting the same file won't fire a
change event; update the success branch of the submit handler in
ChatInput.svelte to also reset the DOM file input (the same element the error
path resets) by setting its value to an empty string (e.g., fileInput.value = ""
or fileInputEl.value = "") or calling the same reset logic used in the error
path so the native input is cleared consistently with files and value.
bindu/utils/worker/messages.py-83-89 (1)

83-89: ⚠️ Potential issue | 🟡 Minor

Use strict base64 decoding here.

base64.b64decode() is permissive by default and will ignore non-base64 characters, so malformed uploads can still reach the PDF/DOCX/text parsers as corrupted bytes instead of hitting the failure path. Decode with validation after trimming whitespace.

Suggested fix
-                file_bytes = base64.b64decode(base64_data)
+                file_bytes = base64.b64decode(base64_data.strip(), validate=True)

As per coding guidelines, "Validate all external input and use type hints for input validation in Python files".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/utils/worker/messages.py` around lines 83 - 89, The base64 decoding is
too permissive: trim whitespace from base64_data, then decode using strict
validation (use base64.b64decode with validate=True) and catch binascii.Error
(or ValueError) to raise/log a clear "invalid base64" error before passing
file_bytes to PDF/DOCX/text parsers; update the handling around base64_data and
file_bytes in this try block (and add appropriate type hints for the function
parameter that supplies base64_data) so malformed uploads fail fast instead of
producing corrupted bytes for downstream parsing.
docs/VOICE_AGENT_PLAN.md-119-127 (1)

119-127: ⚠️ Potential issue | 🟡 Minor

Update the test path in the file matrix.

The plan currently points contributors to tests/unit/server/endpoints/test_voice_endpoints.py, but this PR adds tests/unit/extensions/voice/test_voice_endpoints.py.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/VOICE_AGENT_PLAN.md` around lines 119 - 127, Update the tests file
matrix entry so it references the new test location: replace the path string
`tests/unit/server/endpoints/test_voice_endpoints.py` with
`tests/unit/extensions/voice/test_voice_endpoints.py` in the table under "Tests
(5 files)" (the table row currently listing the REST + WebSocket endpoint
tests).
tests/unit/extensions/voice/test_voice_endpoints.py-73-76 (1)

73-76: ⚠️ Potential issue | 🟡 Minor

Avoid pinning the WebSocket URL to an absolute ws:// origin.

This assertion bakes the proxy-unfriendly contract into the test suite. If the client derives the origin locally, or the endpoint switches to a relative path / wss://, this test will fail for the right fix.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/extensions/voice/test_voice_endpoints.py` around lines 73 - 76,
The test currently asserts body["ws_url"].startswith("ws://") which hard-codes
an absolute ws:// origin; change the assertion in
tests/unit/extensions/voice/test_voice_endpoints.py to accept either a websocket
URL with scheme "ws" or "wss" or a relative path. For example, parse
body["ws_url"] (or inspect its prefix) and assert that it either starts with "/"
(relative path) or has a scheme of "ws" or "wss" (allowing both "ws://" and
"wss://"); keep the existing assertions for "session_id" and "context_id"
intact.
docs/VOICE_AGENT_PLAN.md-16-58 (1)

16-58: ⚠️ Potential issue | 🟡 Minor

Fix the markdownlint violations in the plan doc.

These fenced blocks still need explicit languages (text works well for diagrams/protocol snippets), and the unit-test table should be surrounded by blank lines so the doc passes lint cleanly.

Also applies to: 413-418, 435-453, 536-548, 551-556, 618-632, 650-669, 678-686, 789-798

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/VOICE_AGENT_PLAN.md` around lines 16 - 58, The markdown has fenced code
blocks and a unit-test table missing lint requirements: update each fenced block
(e.g., the protocol/diagram blocks such as the large ASCII diagram shown) to
include a language tag (use "text" for diagrams) like ```text, and ensure the
unit-test table is surrounded by blank lines above and below so it is treated as
a proper block; apply the same fixes to all other occurrences noted (lines
referenced in the review) to clear markdownlint violations.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2cc17dac-ef09-4280-95b0-6d3c3f79ef8c

📥 Commits

Reviewing files that changed from the base of the PR and between a864dae and cfabcbe.

⛔ Files ignored due to path filters (4)
  • bindu/grpc/generated/agent_handler_pb2.py is excluded by !**/generated/**
  • bindu/grpc/generated/agent_handler_pb2.pyi is excluded by !**/generated/**
  • bindu/grpc/generated/agent_handler_pb2_grpc.py is excluded by !**/generated/**
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (97)
  • .github/workflows/ci.yml
  • README.bn.md
  • README.de.md
  • README.es.md
  • README.fr.md
  • README.hi.md
  • README.nl.md
  • README.ta.md
  • README.zh.md
  • bindu/common/protocol/types.py
  • bindu/extensions/__init__.py
  • bindu/extensions/voice/__init__.py
  • bindu/extensions/voice/agent_bridge.py
  • bindu/extensions/voice/audio_config.py
  • bindu/extensions/voice/pipeline_builder.py
  • bindu/extensions/voice/redis_session_manager.py
  • bindu/extensions/voice/service_factory.py
  • bindu/extensions/voice/session_factory.py
  • bindu/extensions/voice/session_manager.py
  • bindu/extensions/voice/voice_agent_extension.py
  • bindu/penguin/bindufy.py
  • bindu/server/applications.py
  • bindu/server/endpoints/voice_endpoints.py
  • bindu/server/metrics.py
  • bindu/server/scheduler/memory_scheduler.py
  • bindu/server/workers/base.py
  • bindu/settings.py
  • bindu/utils/__init__.py
  • bindu/utils/capabilities.py
  • bindu/utils/logging.py
  • bindu/utils/retry.py
  • bindu/utils/task_telemetry.py
  • bindu/utils/worker/messages.py
  • bindu/utils/worker/parts.py
  • docs/VOICE_AGENT_PLAN.md
  • examples/README.md
  • examples/ag2_research_team/skills/research-team-skill/skill.yaml
  • examples/beginner/README.md
  • examples/beginner/motivational_agent.py
  • examples/document-analyzer/document_analyzer.py
  • examples/typescript-langchain-agent/quiz-agent.ts
  • examples/typescript-langchain-quiz-agent/.env.example
  • examples/typescript-langchain-quiz-agent/quiz-agent.ts
  • examples/voice-agent/.env.example
  • examples/voice-agent/README.md
  • examples/voice-agent/main.py
  • frontend/.env.example
  • frontend/.gitignore
  • frontend/src/lib/components/ShareConversationModal.svelte
  • frontend/src/lib/components/chat/ChatInput.svelte
  • frontend/src/lib/components/chat/ChatMessage.svelte
  • frontend/src/lib/components/chat/ChatWindow.svelte
  • frontend/src/lib/components/chat/ContextList.svelte
  • frontend/src/lib/components/voice/LiveTranscript.svelte
  • frontend/src/lib/components/voice/VoiceCallButton.svelte
  • frontend/src/lib/components/voice/VoiceCallPanel.svelte
  • frontend/src/lib/constants/mime.ts
  • frontend/src/lib/jobs/refresh-conversation-stats.ts
  • frontend/src/lib/migrations/lock.ts
  • frontend/src/lib/migrations/migrations.ts
  • frontend/src/lib/migrations/routines/02-update-assistants-models.ts
  • frontend/src/lib/migrations/routines/10-update-reports-assistantid.ts
  • frontend/src/lib/server/config.ts
  • frontend/src/lib/server/database.ts
  • frontend/src/lib/server/files/downloadFile.ts
  • frontend/src/lib/server/files/uploadFile.ts
  • frontend/src/lib/server/models.ts
  • frontend/src/lib/services/voice-client.ts
  • frontend/src/lib/stores/chat.ts
  • frontend/src/lib/stores/voice.ts
  • frontend/src/lib/types/ConvSidebar.ts
  • frontend/src/lib/types/Model.ts
  • frontend/src/lib/types/Session.ts
  • frontend/src/lib/utils/agentMessageHandler.ts
  • frontend/src/lib/utils/tree/addChildren.spec.ts
  • frontend/src/lib/utils/tree/addSibling.spec.ts
  • frontend/src/lib/utils/tree/buildSubtree.spec.ts
  • frontend/src/lib/utils/tree/convertLegacyConversation.spec.ts
  • frontend/src/routes/+layout.svelte
  • frontend/src/routes/+page.svelte
  • frontend/src/routes/api/v2/conversations/[id]/message/[messageId]/+server.ts
  • frontend/src/routes/api/v2/export/+server.ts
  • frontend/src/routes/conversation/[id]/+server.ts
  • frontend/src/routes/settings/(nav)/authentication/+page.svelte
  • frontend/src/routes/settings/(nav)/negotiation/+page.svelte
  • pyproject.toml
  • tests/conftest_stubs.py
  • tests/unit/extensions/__init__.py
  • tests/unit/extensions/voice/__init__.py
  • tests/unit/extensions/voice/test_agent_bridge.py
  • tests/unit/extensions/voice/test_service_factory.py
  • tests/unit/extensions/voice/test_session_manager.py
  • tests/unit/extensions/voice/test_voice_endpoints.py
  • tests/unit/extensions/voice/test_voice_extension.py
  • tests/unit/server/scheduler/test_memory_scheduler.py
  • tests/unit/test_minimax_example.py
  • tests/unit/utils/worker/test_messages.py

Comment thread bindu/extensions/voice/redis_session_manager.py Outdated
Comment thread bindu/server/endpoints/voice_endpoints.py Outdated
Comment thread frontend/src/lib/server/models.ts
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (2)
bindu/server/endpoints/voice_endpoints.py (2)

57-62: Consider narrowing exception catch for body parsing.

While the silent pass is intentional for empty/invalid bodies, catching json.JSONDecodeError specifically would be more precise and avoid masking unexpected errors.

♻️ Proposed refactor
     try:
         body = await request.json()
         if isinstance(body, dict) and "context_id" in body:
             context_id = str(body["context_id"])
-    except Exception:
-        pass  # empty body is fine, we'll generate a new context_id
+    except (json.JSONDecodeError, ValueError):
+        pass  # empty or invalid body is fine, we'll generate a new context_id
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/server/endpoints/voice_endpoints.py` around lines 57 - 62, Narrow the
broad except: when parsing the request body with await request.json(), catch
json.JSONDecodeError (and optionally aiohttp.ContentTypeError if using aiohttp)
instead of a bare Exception so we don't mask other errors; keep the same
behavior of treating empty/invalid JSON as "generate a new context_id" and still
set context_id from body["context_id"] when present. Ensure you import json (and
aiohttp.ContentTypeError if used) and update the try/except around
request.json() and the context_id assignment.

488-495: Consider reusing httpx client for reduced latency.

Creating a new AsyncClient for each STT/TTS call adds connection establishment overhead. For real-time voice with frequent transcription calls (~1s intervals), consider a session-scoped or module-level client with connection pooling.

♻️ Example approach
# Module-level shared client (initialize on first use or app startup)
_http_client: httpx.AsyncClient | None = None

async def _get_http_client() -> httpx.AsyncClient:
    global _http_client
    if _http_client is None:
        _http_client = httpx.AsyncClient(timeout=30.0)
    return _http_client

Or pass a client instance through the WebSocket handler lifecycle.

Also applies to: 525-530

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/server/endpoints/voice_endpoints.py` around lines 488 - 495, The
TTS/STT handlers currently create a new httpx.AsyncClient per request (the try
block that async with httpx.AsyncClient(timeout=30.0) as client), causing
connection overhead; replace that pattern with a shared, module-level or
lifecycle-scoped AsyncClient (e.g., implement a single _http_client and accessor
_get_http_client or accept a client instance into the WebSocket/handler) and use
that client for requests instead of creating a new one in the try/except, and
update both occurrences (the AsyncClient creation at the shown try block and the
similar block referenced at 525-530) so connection pooling is reused and the
client is closed on app shutdown.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bindu/extensions/voice/redis_session_manager.py`:
- Around line 89-93: The except block in redis_session_manager.py that catches
redis.RedisError logs the error and re-raises a ConnectionError but doesn't
chain the original exception; update the except handler inside the
RedisSessionManager connection logic (the block that uses logger.error and
raises ConnectionError referencing self.redis_url) to re-raise the new
ConnectionError using exception chaining (raise ConnectionError(... ) from e) so
the original traceback is preserved.
- Around line 53-74: The RedisSessionManager __init__ is missing initialization
of the asyncio lock used elsewhere; add initialization of self._lock =
asyncio.Lock() inside the __init__ of the Redis session manager (alongside
setting self._redis_client, self._cleanup_task, and
self._create_session_script_sha) so methods end_session, update_state, and
_expire_timed_out_sessions can safely await the lock without causing
AttributeError.
- Around line 248-277: The async `@property` active_count is invalid; change
RedisVoiceSessionManager.active_count to an async method (e.g., async def
get_active_count(self) -> int) by removing the `@property` decorator and renaming
the method, and update the base/sync VoiceSessionManager to declare a compatible
async signature (or provide an async abstract get_active_count) so
implementations match; also update all callers to await
session_manager.get_active_count() instead of accessing
session_manager.active_count.

In `@bindu/server/endpoints/voice_endpoints.py`:
- Around line 581-591: The fire-and-forget created in _try_send_json uses
loop.create_task(...) without keeping a reference so the task can be
garbage-collected and the send dropped; fix by creating a module-level set
(e.g., _BACKGROUND_TASKS) to hold references to tasks created by _try_send_json,
add each created task to that set, and attach a done callback that removes the
task from the set when finished and handles/logs exceptions; reference the
functions _try_send_json and _send_json when making these changes.
- Around line 504-507: The CI failed due to trailing whitespace on the line with
the empty-audio check in voice_endpoints.py (the block that checks pcm_bytes and
calls logger.debug("Empty audio buffer, skipping transcription")); remove the
trailing whitespace characters at the end of that line (and any other trailing
spaces in the file), save the file, and commit the change so the pre-commit hook
passes.

---

Nitpick comments:
In `@bindu/server/endpoints/voice_endpoints.py`:
- Around line 57-62: Narrow the broad except: when parsing the request body with
await request.json(), catch json.JSONDecodeError (and optionally
aiohttp.ContentTypeError if using aiohttp) instead of a bare Exception so we
don't mask other errors; keep the same behavior of treating empty/invalid JSON
as "generate a new context_id" and still set context_id from body["context_id"]
when present. Ensure you import json (and aiohttp.ContentTypeError if used) and
update the try/except around request.json() and the context_id assignment.
- Around line 488-495: The TTS/STT handlers currently create a new
httpx.AsyncClient per request (the try block that async with
httpx.AsyncClient(timeout=30.0) as client), causing connection overhead; replace
that pattern with a shared, module-level or lifecycle-scoped AsyncClient (e.g.,
implement a single _http_client and accessor _get_http_client or accept a client
instance into the WebSocket/handler) and use that client for requests instead of
creating a new one in the try/except, and update both occurrences (the
AsyncClient creation at the shown try block and the similar block referenced at
525-530) so connection pooling is reused and the client is closed on app
shutdown.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 90e932fb-3d8b-483b-93f0-5526868a1925

📥 Commits

Reviewing files that changed from the base of the PR and between cfabcbe and 4d26651.

📒 Files selected for processing (4)
  • bindu/extensions/voice/redis_session_manager.py
  • bindu/server/endpoints/voice_endpoints.py
  • frontend/src/lib/buildPrompt.ts
  • frontend/src/lib/server/models.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • frontend/src/lib/server/models.ts

Comment on lines +53 to +74
def __init__(
self,
redis_url: str,
max_sessions: int = 10,
session_timeout: int = DEFAULT_SESSION_TTL,
redis_session_ttl: int = DEFAULT_SESSION_TTL,
):
"""Initialize the Redis session manager.

Args:
redis_url: Redis connection URL
max_sessions: Maximum concurrent sessions allowed
session_timeout: Session timeout in seconds (for cleanup)
redis_session_ttl: TTL for Redis keys in seconds
"""
self.redis_url = redis_url
self._max_sessions = max_sessions
self._session_timeout = session_timeout
self._redis_session_ttl = redis_session_ttl
self._redis_client: redis.Redis | None = None
self._cleanup_task: asyncio.Task[None] | None = None
self._create_session_script_sha: str | None = None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Missing _lock attribute causes AttributeError at runtime.

The class uses self._lock in end_session() (line 202), update_state() (line 234), and _expire_timed_out_sessions() (line 316), but _lock is never initialized in __init__. The base VoiceSessionManager initializes self._lock = asyncio.Lock() (see context snippet 1), but this class does not.

🐛 Proposed fix
         self._redis_session_ttl = redis_session_ttl
         self._redis_client: redis.Redis | None = None
         self._cleanup_task: asyncio.Task[None] | None = None
         self._create_session_script_sha: str | None = None
+        self._lock = asyncio.Lock()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/extensions/voice/redis_session_manager.py` around lines 53 - 74, The
RedisSessionManager __init__ is missing initialization of the asyncio lock used
elsewhere; add initialization of self._lock = asyncio.Lock() inside the __init__
of the Redis session manager (alongside setting self._redis_client,
self._cleanup_task, and self._create_session_script_sha) so methods end_session,
update_state, and _expire_timed_out_sessions can safely await the lock without
causing AttributeError.

Comment thread bindu/extensions/voice/redis_session_manager.py Outdated
Comment thread bindu/extensions/voice/redis_session_manager.py Outdated
Comment thread bindu/server/endpoints/voice_endpoints.py Outdated
Comment thread bindu/server/endpoints/voice_endpoints.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
bindu/server/endpoints/voice_endpoints.py (1)

584-593: ⚠️ Potential issue | 🟠 Major

Retain task references for callback-triggered sends.

loop.create_task(...) without a strong reference can drop send tasks before completion, causing silent message loss.

💡 Proposed fix
+_background_tasks: set[asyncio.Task[None]] = set()
+
+def _track_background_task(task: asyncio.Task[None]) -> None:
+    _background_tasks.add(task)
+    def _done(t: asyncio.Task[None]) -> None:
+        _background_tasks.discard(t)
+        try:
+            t.result()
+        except Exception:
+            logger.exception("Background websocket send task failed")
+    task.add_done_callback(_done)
+
 def _try_send_json(
     websocket: WebSocket,
     data: dict,
     send_lock: asyncio.Lock,
 ) -> None:
     """Enqueue a JSON send (safe to call from sync callbacks)."""
     try:
         loop = asyncio.get_running_loop()
-        loop.create_task(_send_json(websocket, data, send_lock))
+        task = loop.create_task(_send_json(websocket, data, send_lock))
+        _track_background_task(task)
     except RuntimeError:
         pass
#!/bin/bash
# Verify create_task call sites in this file and inspect whether return values are retained.
rg -n -C2 'create_task\(' bindu/server/endpoints/voice_endpoints.py
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindu/server/endpoints/voice_endpoints.py` around lines 584 - 593, The
_try_send_json helper creates background tasks via
loop.create_task(_send_json(...)) but does not retain a reference which can
allow the task to be garbage-collected and drop messages; change _try_send_json
to store the returned Task in a short-lived container and remove it when
finished (e.g., maintain a set of pending tasks and add
task.add_done_callback(lambda t: pending.discard(t))) so tasks started from
callback contexts are strongly referenced until completion; reference the
symbols _try_send_json, _send_json, WebSocket and send_lock when locating where
to create the pending-tasks set (module-level or attached to the
websocket/session object) and ensure thread-safety around adding/removing tasks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bindu/server/endpoints/voice_endpoints.py`:
- Around line 218-219: The code currently hardcodes tuning values (e.g.,
chunk_bytes and overlap_bytes) and other configuration like provider URLs and
HTTP timeouts inside voice_endpoints.py; extract these into
bindu.settings.app_settings.voice (add fields such as sample_rate,
audio_channels, chunk_overlap_fraction or overlap_bytes, chunk_throttle_ms,
provider_urls, http_timeout_seconds, etc.) and replace direct literals/usages
(notably the variables chunk_bytes and overlap_bytes and the hardcoded values at
the other occurrences referenced in the review: ~310-313, ~476-477, ~492-493,
~512-513, ~529-531) to read from app_settings.voice instead so all tuning, URLs
and timeouts are configurable via settings and not embedded in the module.
- Around line 57-62: The code currently swallows all JSON parse errors when
reading request.json() and silently generates a new context_id; instead,
explicitly return HTTP 400 for malformed JSON while preserving the "empty body"
behavior. Replace the single await request.json() call with logic that first
reads the raw body (await request.body()), if the body is empty keep generating
a new context_id, otherwise attempt to parse via json.loads(...) and on
json.JSONDecodeError raise fastapi.HTTPException(status_code=400,
detail="Malformed JSON payload"); if parsing succeeds and the result is a dict
with "context_id", set context_id = str(body["context_id"]). Ensure you import
json and fastapi.HTTPException and update the code around request.json() and
context_id handling accordingly.

---

Duplicate comments:
In `@bindu/server/endpoints/voice_endpoints.py`:
- Around line 584-593: The _try_send_json helper creates background tasks via
loop.create_task(_send_json(...)) but does not retain a reference which can
allow the task to be garbage-collected and drop messages; change _try_send_json
to store the returned Task in a short-lived container and remove it when
finished (e.g., maintain a set of pending tasks and add
task.add_done_callback(lambda t: pending.discard(t))) so tasks started from
callback contexts are strongly referenced until completion; reference the
symbols _try_send_json, _send_json, WebSocket and send_lock when locating where
to create the pending-tasks set (module-level or attached to the
websocket/session object) and ensure thread-safety around adding/removing tasks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 21703d2e-989a-4ebb-a5ee-4b02a03c473d

📥 Commits

Reviewing files that changed from the base of the PR and between 4d26651 and 86c8b10.

📒 Files selected for processing (1)
  • bindu/server/endpoints/voice_endpoints.py

Comment thread bindu/server/endpoints/voice_endpoints.py Outdated
Comment thread bindu/server/endpoints/voice_endpoints.py Outdated
@Co-vengers
Copy link
Copy Markdown
Contributor Author

Hey @chandan-1427
I have made few changes addressing the implementation gaps. Looking forward for the review and feedback.

@chandan-1427
Copy link
Copy Markdown
Contributor

Hi @Co-vengers ,

Amazing work on the latest commits! The frontend Web Audio API implementation is great.

While reviewing the new code, I noticed the CI pipeline crashed before running any tests due to a few strict pre-commit linters, and I spotted one tiny edge case in the new backend history logic.

Could you patch these final 4 things?

  1. Fatal Syntax Error (did_agent_extension.py)
    The from future import annotations statement on line 20 is placed after import os. Python strictly requires future imports to be the absolute first line of code in a file. Moving it to the top will fix the Pytest and Ruff crashes that blocked the CI.

  2. Security Scanner (pdf_research_agent example)
    The detect-secrets hook caught the dummy API keys in pdf_research_agent.py and the README.md.
    Just append # pragma: allowlist secret to the end of the line with the dummy key in the Python script so the linter ignores it.

  3. Docstring Linter (session_manager.py & redis_session_manager.py)
    The pydocstyle hook failed on get_active_count because the docstring isn't in the imperative mood. Changing "Number of sessions..." to "Return the number of sessions..." will clear this up.

  4. History Trimming Parity (agent_bridge.py)
    In your new _trim_history method, del self._conversation_history[:overflow] deletes exactly overflow number of messages. If a user speaks, the array length is 41, so it deletes exactly 1 message (the oldest user message). This leaves the history array starting with an assistant response, which causes many LLMs (like Anthropic Claude) to throw a 400 Bad Request API error, as they require history to start with a user.

Fix: Ensure messages are deleted in pairs (user + assistant) so the array always starts with a user message. CodeRabbit actually left a great inline suggestion for this using turns_to_drop = max(1, (overflow + 1) // 2).

Looking forward!

@Co-vengers Co-vengers force-pushed the feature/voice-agent branch from 3b240cc to 3be6819 Compare April 3, 2026 11:33
@Co-vengers
Copy link
Copy Markdown
Contributor Author

Hi @chandan-1427
I have fixed the issues as highlighted. Looking forward for the review.

@chandan-1427
Copy link
Copy Markdown
Contributor

@Co-vengers great work on these updates. I ran this locally and caught one issue in voice_endpoints.py (_transcribe_pcm_buffer).
Deepgram is throwing a 400 error because encoding, sample_rate, and channels are currently passed in the Content-Type header instead of as URL params.
Could you move those three settings into the params dictionary?

@Co-vengers
Copy link
Copy Markdown
Contributor Author

@Co-vengers great work on these updates. I ran this locally and caught one issue in voice_endpoints.py (_transcribe_pcm_buffer). Deepgram is throwing a 400 error because encoding, sample_rate, and channels are currently passed in the Content-Type header instead of as URL params. Could you move those three settings into the params dictionary?

Sure I'll fix those issues.

@Co-vengers
Copy link
Copy Markdown
Contributor Author

@chandan-1427 I have fixed the issue and also made some necessary changes in integrating pipecat pipeline.

@Co-vengers
Copy link
Copy Markdown
Contributor Author

@chandan-1427 Let me know if any further fixes required after review.

@chandan-1427
Copy link
Copy Markdown
Contributor

@Co-vengers Tested this locally and found two blocking issues in VoiceClient.ts (startAudioStreaming):
Browser freeze: The custom resampleChunk has a math bug (state.t goes negative, causing massive array allocations that lock the UI thread). Please remove the manual resampler entirely and just let the browser handle it via new AudioContext({ sampleRate: 16000 }).
Audio feedback loop: The mic is routing directly to the speakers, causing severe echo. Route the processorNode through a muted GainNode (gain.value = 0) before connecting to audioContext.destination.
Please patch these and test everything manually by speaking to the it and verifying the results.

@Co-vengers
Copy link
Copy Markdown
Contributor Author

@chandan-1427 I have fixed VoiceClient.ts to address both issues:

Removed the manual resampler and now rely on new AudioContext({ sampleRate: 16000 }) for browser-side resampling.
I have also fixed the feedback loop by routing processorNode through a muted GainNode before audioContext.destination
Manually tested the voice flow and confirmed the freeze is gone and the echo loop is resolved.

@Co-vengers Co-vengers force-pushed the feature/voice-agent branch 2 times, most recently from 636312f to df5dedb Compare April 13, 2026 07:38
…Bindu#353)

Worker accessed task_operation["_current_span"] but scheduler now sends
primitive trace_id/span_id strings. Add _reconstruct_span() helper to
rebuild a NonRecordingSpan from hex-encoded IDs with graceful fallback.
Replace math.inf buffer size with a constant of 100 to prevent
unbounded memory growth while still allowing task enqueue before the
worker loop is ready.
Add SpanContext, TraceFlags, NonRecordingSpan, and INVALID_SPAN_CONTEXT
mocks. Register opentelemetry.trace.span submodule so worker imports
resolve in the test environment.
@Co-vengers Co-vengers force-pushed the feature/voice-agent branch from df5dedb to fcf2e4e Compare April 13, 2026 12:59
@chandan-1427
Copy link
Copy Markdown
Contributor

@Co-vengers this has too many files to check and evaluate could you raise separate prs for dedicated issue like a package pr, example agent pr, documentation pr and codes pr. It will be more efficient this way. Could you please do this?

@Co-vengers
Copy link
Copy Markdown
Contributor Author

@chandan-1427 I'll split if its necessary. Just let me know how each split should look. It must be standalone. Or each split must include changes of prev split as creating new branch for each section would by default include those files and aggregate into itself.

@chandan-1427
Copy link
Copy Markdown
Contributor

Hey @Co-vengers , thanks for being open to splitting this up!
Since your new features rely on each other, the easiest way to do this without showing duplicate files in every review is to build your branches like a ladder.
Here is exactly how to set it up:
Step 1 (Code): Create your first branch normally starting from main. Add just the core code, and open a PR pointing to main.
Step 2 (Example Agent): Create your next branch starting from your Step 1 branch (not from main). Add your example files. When you open this PR on GitHub, change the "base" dropdown on the left from main to your Step 1 branch. This tells GitHub to hide the core code so we only have to review the new example files!
Step 3 (Docs): Create your last branch starting from your Step 2 branch. Add the docs. When opening the PR, change that same "base" dropdown to point to your Step 2 branch.

@Co-vengers
Copy link
Copy Markdown
Contributor Author

@chandan-1427 I have raised PRs splitting into code, examples, and docs for review:

These are for review purposes only — they may include merge conflicts. Once reviewed, all changes will be aggregated back into this branch.

Hope this helps with the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants