Conversation
|
Hey, thanks for working on adding voice support — really appreciate the effort here. I went through the implementation and there are a few areas we’ll need to address before merging:
Looking forward to the update! |
6b8acb4 to
20e3419
Compare
|
Hey
Thanks for the review. I'll look into the improvements as suggested. |
20e3419 to
5b0e29e
Compare
5592bd1 to
51e7e91
Compare
|
Hey @chandan-1427 |
hey sorry for late response I’ve gone through the codebase and executed it locally. Before we can merge this, I found a few critical bugs that will cause crashes under load or in a distributed environment, plus a major gap in the frontend implementation. Frontend / Implementation Gaps Missing Real-Time Audio Streaming (VoiceCallPanel.svelte): The backend is beautifully set up to handle real-time binary PCM streams over WebSockets, but the frontend is currently faking it by using the old push-to-talk VoiceRecorder to hit the /api/transcribe REST endpoint. (This is also currently causing a 500 Internal Error locally due to a SvelteKit cookie bug). Fix: Please implement navigator.mediaDevices.getUserMedia to stream binary audio chunks directly to the WebSocket instead of using REST. WebSocket Protocol Mismatch (voice-client.ts): The client blindly connects to the absolute wsUrl returned by the backend. If the backend is behind a reverse proxy, it will return localhost:3773, which breaks the browser connection. Fix: Dynamically construct the WebSocket URL using agentAPI.baseUrl.replace(/^http/, 'ws'). Distributed Clock Desync (session_manager.py): VoiceSession uses time.monotonic() for start_time. This is unique to a specific container's boot time, so if Worker A saves a session to Redis and Worker B reads it, the duration calculation breaks completely. Fix: Use time.time() (epoch time) so it safely serializes across distributed workers. WebSocket Concurrent Write Crashes (voice_endpoints.py): _try_send_json fires messages asynchronously. If STT and TTS trigger simultaneously, Starlette WebSockets will throw a fatal ASGI RuntimeError for concurrent writes. Fix: Introduce an asyncio.Lock() to ensure outbound WebSocket messages are queued sequentially. Context Window Exhaustion (agent_bridge.py): _conversation_history grows unbounded. In a 5-minute voice session, this will quickly blow out the LLM's max token limit and crash the agent. Fix: Implement a sliding window limit (e.g., keep the last 20 turns) or a summarization strategy. Python Garbage Collection Task Killer (agent_bridge.py): In _safe_callback, you call asyncio.create_task(result) without keeping a strong reference. Python's GC will randomly destroy these tasks mid-execution, causing silent failures. Fix: Store tasks in a self._background_tasks = set() and remove them when done. Backpressure Removal (memory_scheduler.py): Replacing await send() with send_nowait() removes queue backpressure. A minor traffic spike will now instantly throw 500 errors instead of pausing the incoming request while a worker frees up. Fix: Revert to await self._write_stream.send(task_op). If you want to prevent indefinite hangs, wrap it in an anyio.fail_after(5.0) block. I haven't checked the tests just yet, but let's get these core architectural blockers sorted out first. Once again thanks for your time and work, really appreciate it! Looking forward to the update |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a voice agent extension with backend voice modules (audio config, STT/TTS factories, agent bridge, session managers), server REST/WebSocket voice endpoints, frontend voice client/components/stores, settings and dependency updates, protocol/type changes, tests, documentation, and small CI/readme formatting fixes. Changes
Sequence Diagram(s)sequenceDiagram
participant Frontend as Frontend (Client)
participant REST as REST (Server)
participant SessionMgr as Session Manager
participant WS as WebSocket Handler
participant STT as STT (Deepgram)
participant Bridge as Agent Bridge
participant Agent as Bindu Agent
participant TTS as TTS (ElevenLabs)
Frontend->>REST: POST /voice/session/start
REST->>SessionMgr: create_session(context_id)
SessionMgr-->>REST: session_id, ws_url
REST-->>Frontend: 201 {session_id, ws_url}
Frontend->>WS: open ws://.../ws/voice/{session_id}
WS->>SessionMgr: get_session(session_id)
SessionMgr-->>WS: VoiceSession
WS->>Bridge: create AgentBridgeProcessor
Frontend->>WS: binary PCM audio frames
WS->>STT: transcribe frames
STT-->>WS: transcript text
WS->>Bridge: process_transcription(text)
Bridge->>Agent: manifest_run(conversation_history)
Agent-->>Bridge: response (text/dict/stream)
Bridge-->>WS: response text
WS->>TTS: synthesize audio
TTS-->>WS: audio bytes
WS-->>Frontend: audio frames and transcript updates
Frontend->>WS: control messages (commit_turn / user_text / stop)
WS->>SessionMgr: update_state / end_session
WS-->>Frontend: close
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Suggested reviewers
Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 3
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
.github/workflows/ci.yml (1)
43-48:⚠️ Potential issue | 🟠 MajorUnit test job is no longer scoped to unit tests
Line 43 now runs all discovered tests, which can pull integration/e2e suites into the
unit-testsjob and conflict with the job’s hermetic intent (and duplicate work already done ine2e-grpc-tests).Proposed fix
- uv run pytest \ + uv run pytest tests/unit/ \ --cov=bindu \ --cov-report=term-missing \ --cov-report=xml:coverage.xml \ --cov-fail-under=60 \ -v🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/ci.yml around lines 43 - 48, The CI step currently invokes "uv run pytest" which discovers and runs all tests (risking inclusion of integration/e2e suites); narrow this job to only unit tests by changing the pytest invocation in the job to target the unit test marker or directory (for example run pytest only on the unit tests directory like tests/unit or use pytest -m "unit" or -k to exclude integration/e2e), keeping the existing coverage flags (--cov=bindu, --cov-report, --cov-fail-under=60) intact so coverage checks still apply to the unit-only run.bindu/server/applications.py (1)
412-447:⚠️ Potential issue | 🟠 MajorMake the voice session manager lifecycle exception-safe.
This cleanup only runs if startup reaches
yield. IfTaskManager(...)or itsasync withfails after the voice manager starts, the lifespan exits early and the cleanup loop/backend connection are left running. Put the voice-manager startup and teardown behind atry/finallyso partial-startup failures close it too.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bindu/server/applications.py` around lines 412 - 447, The voice session manager is started but only torn down if the lifespan reaches yield; wrap the creation and use of app._voice_session_manager in a try/finally so it is always cleaned up on error. Specifically, after calling create_session_manager(...) and await app._voice_session_manager.start_cleanup_loop(), ensure a finally block checks if app._voice_session_manager is set and awaits close_session_manager(app._voice_session_manager) (or await app._voice_session_manager.stop_cleanup_task() where appropriate), and logs shutdown; keep TaskManager logic inside the try so any exception triggers the finally cleanup of the voice manager.
🟠 Major comments (21)
examples/typescript-langchain-agent/quiz-agent.ts-93-100 (1)
93-100:⚠️ Potential issue | 🟠 MajorSelect the latest valid user message, not just the last message.
Line 93 currently assumes the last array element is user input. If the tail message is assistant/system/tool, the model gets the wrong prompt.
Proposed fix
- const last = messages[messages.length - 1]; - let userInput = ""; - if (last && typeof last.content === "string" && last.content.trim().length > 0) { - userInput = last.content; - } else { - // Optionally: return error or fallback - return "Error: Invalid or empty user input."; - } + const lastUserMessage = [...messages] + .reverse() + .find((m) => m?.role === "user" && typeof m.content === "string" && m.content.trim().length > 0); + + if (!lastUserMessage || typeof lastUserMessage.content !== "string") { + return "Error: Invalid or empty user input."; + } + const userInput = lastUserMessage.content;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/typescript-langchain-agent/quiz-agent.ts` around lines 93 - 100, The code currently picks messages[messages.length - 1] (variable last) which may be system/assistant/tool; instead scan messages from the end to find the most recent message with role === "user" and non-empty, and assign its content to userInput (or return an error/fallback if none found). Replace the last-based logic in the block that sets userInput (referencing the variables last and userInput) with a reverse loop or Array.prototype.slice().reverse().find(...) that filters on message.role === "user" and message.content.trim().length > 0, then use that message's content.examples/typescript-langchain-agent/quiz-agent.ts-109-109 (1)
109-109:⚠️ Potential issue | 🟠 MajorAdd explicit timeout and retry configuration to the ChatOpenAI initialization.
Line 20-27 initializes the LLM without timeout or retry guards. The ChatOpenAI constructor accepts
timeout(milliseconds) andmaxRetriesoptions that should be explicitly configured to prevent indefinite hangs on failed or slow requests to OpenRouter:Example fix
const llm = new ChatOpenAI({ model: "openai/gpt-oss-120b", temperature: 0.3, timeout: 30000, // 30 seconds maxRetries: 2, configuration: { baseURL: "https://openrouter.ai/api/v1", apiKey: process.env.OPENROUTER_API_KEY, }, });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/typescript-langchain-agent/quiz-agent.ts` at line 109, The ChatOpenAI instance (the llm variable used later by llm.invoke) is created without explicit timeout or retry settings, which can cause indefinite hangs; update the ChatOpenAI constructor call to include timeout (milliseconds) and maxRetries options (for example timeout: 30000 and maxRetries: 2) and keep the existing configuration/apiKey fields so requests to OpenRouter bail out and retry predictably.tests/unit/server/scheduler/test_memory_scheduler.py-110-131 (1)
110-131:⚠️ Potential issue | 🟠 MajorThis test does not validate capacity/backpressure behavior yet.
test_send_operation_waits_for_capacitycurrently only verifies thatsend()is called. It does not assert waiting behavior or timeout behavior fromanyio.fail_after(...). Please add assertions for blocked-send and timeout/unblock paths so this guards the actual regression risk.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/server/scheduler/test_memory_scheduler.py` around lines 110 - 131, The test currently only checks that scheduler._write_stream.send was called; instead simulate backpressure by making the fake_send await an external synchronization primitive (e.g., anyio.Event or trio.Event) so the send call blocks, then assert that scheduler.run_task does not complete until you set the event (verifying it waits for capacity), and add a separate sub-test where you wrap run_task in anyio.fail_after with a short timeout to assert it times out/unblocks as expected; reference the test method test_send_operation_waits_for_capacity, the InMemoryScheduler instance, scheduler._write_stream.send (fake_send), and anyio.fail_after to locate where to add the blocking/unblocking and timeout assertions.bindu/server/scheduler/memory_scheduler.py-31-33 (1)
31-33:⚠️ Potential issue | 🟠 MajorMove queue buffer size and enqueue timeout to
app_settings.
_TASK_QUEUE_BUFFER_SIZE = 100andanyio.fail_after(5.0)are hardcoded runtime config. Please source both values frombindu.settings.app_settingsso they can be tuned per environment.As per coding guidelines, "Use
bindu.settings.app_settingsfor all configuration - never hardcode values such as URLs, ports, timeouts, API keys, or feature flags" and "NEVER create local config constants - useapp_settingsfrombindu.settings."Also applies to: 80-81
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bindu/server/scheduler/memory_scheduler.py` around lines 31 - 33, Replace the hardcoded _TASK_QUEUE_BUFFER_SIZE and the anyio.fail_after(5.0) timeout with values sourced from bindu.settings.app_settings: import app_settings from bindu.settings, use app_settings.task_queue_buffer_size (or similarly named setting) instead of _TASK_QUEUE_BUFFER_SIZE and replace anyio.fail_after(5.0) with anyio.fail_after(app_settings.task_enqueue_timeout); update both the queue creation and the enqueue timeout usages (including the occurrences around lines with anyio.fail_after and the buffer size) and add or document the corresponding settings in app_settings if they do not yet exist.frontend/src/lib/server/database.ts-206-211 (1)
206-211:⚠️ Potential issue | 🟠 Major
aggregate()still makes the stats job a no-op in memory mode.This overload now accepts the second argument used by
computeStats(), but Lines 210-211 still discard the pipeline and return raw collection documents. That means the stats refresh path never applies its pipeline work when the in-memory backend is active, soconversationStatsstays stale.🛠️ Safe fallback if full pipeline support is not ready yet
aggregate<R = T>( _pipeline: Record<string, unknown>[], _options?: Record<string, unknown> ): InMemoryCursor<R> { - // Simplified aggregation - just return all docs - return new InMemoryCursor(Array.from(this.data.values()) as unknown as R[]); + throw new Error("InMemoryCollection.aggregate() does not execute pipelines yet"); }At minimum, fail fast here; otherwise this path silently reports success while doing nothing.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@frontend/src/lib/server/database.ts` around lines 206 - 211, The in-memory aggregate method (aggregate in frontend/src/lib/server/database.ts) currently ignores the pipeline/options and returns raw docs, causing computeStats/conversationStats to be a no-op; update aggregate to either execute the provided _pipeline/_options against the in-memory data or, if full pipeline support isn’t implemented yet, fail fast by throwing a clear UnsupportedOperationError (or similar) that includes the pipeline and options so callers like computeStats know the operation failed rather than silently returning raw documents. Locate the aggregate<R = T>(_pipeline, _options?) implementation and replace the unconditional return with either a minimal pipeline application over Array.from(this.data.values()) or an explicit throw that surfaces the unsupported pipeline and options to the caller.frontend/src/lib/server/database.ts-147-159 (1)
147-159:⚠️ Potential issue | 🟠 MajorPipeline-array updates are still applied as plain update docs.
Lines 152-158 feed each array stage through
applyUpdate(), butapplyUpdate()only shallow-merges$setvalues and iterates keys from$unset. With the new report migration, that turnscontentId: "$assistantId"into a literal string write, and$unset: "assistantId"will not remove theassistantIdfield.🛠️ Minimal direction
- const pipelineUpdate = Array.isArray(update) - ? update.reduce( - (acc, stage) => this.applyUpdate(acc, stage), - doc - ) - : this.applyUpdate(doc, update); - const updated = pipelineUpdate; + const updated = Array.isArray(update) + ? update.reduce( + (acc, stage) => this.applyPipelineStage(acc, stage), + doc + ) + : this.applyUpdate(doc, update);
applyPipelineStage()should at least resolve"$field"lookups inside$setand treat string/array$unsetvalues as field names.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@frontend/src/lib/server/database.ts` around lines 147 - 159, The pipeline-array branch is incorrectly treating array stages as plain update docs via applyUpdate, which causes "$field" lookups to be written literally and $unset strings/arrays to be ignored; change the reducer to call a new or existing applyPipelineStage (or extend applyUpdate) that for each stage resolves "$field" lookup strings inside $set (replace values like "$assistantId" with the corresponding value from the current document), and treats $unset entries that are strings or arrays as field names to delete from the document; update the reduction call in the update method (the Array.isArray(update) path) to use that pipeline-aware handler so pipeline stages mutate fields and unset keys correctly.examples/voice-agent/README.md-11-14 (1)
11-14:⚠️ Potential issue | 🟠 MajorConsolidate dependency installation to use
uvfor all setup steps.The setup uses
uv syncto create a managed environment, but then runspip install "bindu[voice]"outside of that environment. This barepipcommand can target a different Python interpreter or install the published package instead of the local development version. Later execution withuv runexpects all dependencies to be in uv's managed environment, making the current approach unreliable.Suggested change
-uv sync --dev --extra agents -pip install "bindu[voice]" +uv sync --dev --extra agents --extra voice🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/voice-agent/README.md` around lines 11 - 14, Replace the bare pip install with a uv-run installation so dependencies are installed into uv's managed environment: change the sequence that runs uv sync --dev --extra agents and pip install "bindu[voice]" to use uv run -- pip install "bindu[voice]" (or uv run -- python -m pip install "bindu[voice]") so the install of the local development package happens inside the environment created by uv sync; ensure subsequent instructions that reference uv run assume dependencies were installed via uv run.frontend/src/lib/components/chat/ChatInput.svelte-36-36 (1)
36-36:⚠️ Potential issue | 🟠 MajorAwait
onsubmitbefore clearing the composer.The
onsubmitcallback is typed as synchronous but implementations are async. Calling it withoutawaitclearsvalueandfilesimmediately, before the async send completes—losing user input on send failure.Additionally, the file input element value is reset only in the error path (line 182), not after successful submission. This prevents users from re-selecting the same file.
💡 Suggested changes
- onsubmit?: (message: string, fileParts: Array<{ name: string; mime: string; value: string }>) => void; + onsubmit?: ( + message: string, + fileParts: Array<{ name: string; mime: string; value: string }> + ) => void | Promise<void>;try { const fileParts = await getFileParts(); if (typeof onsubmit === "function") { - onsubmit(value, fileParts); + await onsubmit(value, fileParts); } // Clear files and input after successful submit files = []; value = ""; + if (fileInputEl) fileInputEl.value = ""; } catch (err) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@frontend/src/lib/components/chat/ChatInput.svelte` at line 36, The onsubmit callback is treated as sync but used with async implementations, causing value and files to be cleared before the send completes and losing input on failure; change the ChatInput.svelte handler to await the onsubmit call (await onsubmit(...)) and only clear value and files after the awaited call completes (and in the catch path restore as needed), and always reset the file input element (the DOM file input reference) after successful submission as well as on error so users can re-select the same file; update the type of onsubmit to return Promise<void> to reflect async usage.examples/document-analyzer/document_analyzer.py-117-123 (1)
117-123:⚠️ Potential issue | 🟠 MajorDon't send the uploaded document under both
User PromptandDocument Content.In the chat-format path,
contentis appended toprompt_partsand also copied intoextracted_docswhen it contains the document marker. That duplicates the full document in the model input and can also leak prompt text intoDocument Contentwhen both live in the same message.Suggested fix
- if isinstance(content, str) and content.strip(): - prompt_parts.append(content) - if "--- Document Uploaded" in content: - extracted_docs.append(content) + if isinstance(content, str) and content.strip(): + docs = DOCUMENT_BLOCK_RE.findall(content) + if docs: + extracted_docs.extend(docs) + content = DOCUMENT_BLOCK_RE.sub("", content).strip() + if content: + prompt_parts.append(content) continueAdd a compiled
DOCUMENT_BLOCK_REnear the top of the module so you can separate prompt text from embedded document blocks.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/document-analyzer/document_analyzer.py` around lines 117 - 123, The chat-format handler appends the entire message content to prompt_parts and also copies it into extracted_docs when it contains the document marker, causing duplication and possible leakage of prompt text into document content; introduce a compiled DOCUMENT_BLOCK_RE at the top of the module and use it in the loop to split msg["content"] into document blocks and remaining prompt text, append only the non-document remainder to prompt_parts and append only the matched document blocks to extracted_docs (do not copy the full content variable into extracted_docs), updating the logic around variables content, prompt_parts, and extracted_docs accordingly.bindu/extensions/voice/audio_config.py-14-17 (1)
14-17:⚠️ Potential issue | 🟠 Major
FRAME_SIZEshould be derived from the active sample rate, not always 16kHz.
VoiceAgentExtensionallows non-default rates, but this constant is frozen fromDEFAULT_SAMPLE_RATE. Any caller that uses it will chunk 24kHz/48kHz sessions with 16kHz-sized frames, which breaks streaming timing and VAD alignment.Suggested fix
-FRAME_SIZE: int = ( - DEFAULT_SAMPLE_RATE * FRAME_DURATION_MS // 1000 * BYTES_PER_SAMPLE -) # 640 bytes +def get_frame_size( + sample_rate: int, + channels: int = DEFAULT_CHANNELS, + bytes_per_sample: int = BYTES_PER_SAMPLE, +) -> int: + return sample_rate * FRAME_DURATION_MS // 1000 * channels * bytes_per_sample + +FRAME_SIZE: int = get_frame_size(DEFAULT_SAMPLE_RATE)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bindu/extensions/voice/audio_config.py` around lines 14 - 17, The FRAME_SIZE constant is incorrectly calculated from DEFAULT_SAMPLE_RATE, causing wrong chunking for non-16k sessions; change the design so FRAME_SIZE is computed from the active sample rate (e.g., add a helper function like compute_frame_size(sample_rate: int) or make FRAME_SIZE a property on the audio config/VoiceAgentExtension instance) using FRAME_DURATION_MS and BYTES_PER_SAMPLE instead of DEFAULT_SAMPLE_RATE, and update all callers to use that computed value (refer to FRAME_SIZE, FRAME_DURATION_MS, DEFAULT_SAMPLE_RATE, BYTES_PER_SAMPLE and VoiceAgentExtension to find where to replace the constant with the new computed value).bindu/extensions/voice/redis_session_manager.py-64-71 (1)
64-71:⚠️ Potential issue | 🟠 MajorDon't log Redis URLs verbatim.
Connection strings often carry passwords, and both the info log and the raised
ConnectionErrorcurrently include the full URL.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bindu/extensions/voice/redis_session_manager.py` around lines 64 - 71, The code logs and raises errors containing the full Redis URL (including secrets); update the block around the ping call (where self._redis_client.ping() is invoked and logger.info / ConnectionError are used) to redact sensitive parts of self.redis_url before output: parse self.redis_url (e.g., urllib.parse.urlparse), remove or replace the password/userinfo, and reconstruct a safe string (or log only hostname:port) and use that redacted value in logger.info and the raised ConnectionError message instead of the raw URL.bindu/extensions/voice/agent_bridge.py-89-103 (1)
89-103:⚠️ Potential issue | 🟠 MajorAvoid logging raw transcript content.
These debug lines persist user utterances and agent replies into application logs, which turns every voice session into log-retained conversational data.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bindu/extensions/voice/agent_bridge.py` around lines 89 - 103, The debug calls in agent_bridge.py currently log raw user transcript and agent replies (the logger.debug calls referencing self._context_id, text and response_text) which must be removed or redacted; replace them with non-sensitive metadata-only logs (e.g., indicate message direction, context id, and length or a hash) and ensure you do not print or persist the full text before or after calling _invoke_agent; keep the conversation storage behavior (_conversation_history.append and _trim_history) intact but stop emitting raw content to logs.bindu/extensions/voice/agent_bridge.py-159-163 (1)
159-163:⚠️ Potential issue | 🟠 MajorTrim whole turns, not individual messages.
When the history is full and a new user utterance arrives, this can delete just the oldest user message and leave the next
manifest.run(...)starting with an orphaned assistant response.🧠 Suggested fix
def _trim_history(self) -> None: """Keep only the most recent conversation turns.""" overflow = len(self._conversation_history) - self._max_history_messages if overflow > 0: - del self._conversation_history[:overflow] + turns_to_drop = max(1, (overflow + 1) // 2) + del self._conversation_history[: turns_to_drop * 2]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bindu/extensions/voice/agent_bridge.py` around lines 159 - 163, _trimp_history currently deletes individual messages which can leave an assistant message orphaned; change it to trim whole conversation turns instead. In the _trim_history method use _conversation_history and _max_history_messages to remove messages in pairs (user+assistant) from the start until len(_conversation_history) <= _max_history_messages, or if messages are role-tagged, repeatedly pop from the start until the first remaining entry is a user turn and the total length is within _max_history_messages; ensure the truncation logic prevents leaving an assistant response at the start (and update any related assumptions in manifest.run if it expects history to start with a user).frontend/src/lib/components/chat/ChatWindow.svelte-166-178 (1)
166-178:⚠️ Potential issue | 🟠 MajorDon't clear the composer on failed sends.
The
finallyblock wipesdraftandfileseven whensendAgentMessage(...)throws, so the user loses the message and attachments they need to retry.💬 Suggested fix
try { for await (const _update of sendAgentMessage(message, contextId, undefined, undefined, undefined, undefined, fileParts)) { // Process updates if needed } + draft = ""; + files = []; } catch (err) { console.error("Error sending agent message:", err); - // Optionally surface error to user - alert(`Error: ${err instanceof Error ? err.message : String(err)}`); - } finally { - // Always clear draft and files - draft = ""; - files = []; + $error = err instanceof Error ? err.message : String(err); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@frontend/src/lib/components/chat/ChatWindow.svelte` around lines 166 - 178, The finally block currently clears draft and files even when sendAgentMessage(message, contextId, ...) fails; change this so draft and files are only cleared on successful send: remove the clearing from the finally block and instead clear draft and files after the sendAgentMessage iteration completes without throwing (e.g., after the for-await loop or inside a success branch), leaving the catch block to surface the error without wiping the composer so users can retry.bindu/extensions/voice/voice_agent_extension.py-28-40 (1)
28-40: 🛠️ Refactor suggestion | 🟠 MajorUse
app_settings.voiceas the single source of defaults.These hardcoded constructor defaults duplicate values already defined in
VoiceSettings, so the extension metadata can drift from runtime config whenever only part of the voice config is supplied.As per coding guidelines "Use bindu.settings.app_settings for all configuration - never hardcode values such as URLs, ports, timeouts, API keys, or feature flags" and "NEVER create local config constants - use app_settings from bindu.settings".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bindu/extensions/voice/voice_agent_extension.py` around lines 28 - 40, The __init__ of VoiceAgentExtension currently hardcodes defaults (stt_provider, stt_model, stt_language, tts_provider, tts_voice_id, tts_model, sample_rate, allow_interruptions, vad_enabled, description); change the signature so these parameters are optional (no hardcoded defaults) and inside VoiceAgentExtension.__init__ read fallback values from app_settings.voice (e.g., app_settings.voice.stt_provider, .stt_model, .stt_language, .tts_provider, .tts_voice_id, .tts_model, .sample_rate, .allow_interruptions, .vad_enabled, .description) assigning them when the incoming arg is None, so the extension uses app_settings.voice as the single source of truth and avoids duplicated local defaults.bindu/settings.py-1038-1070 (1)
1038-1070:⚠️ Potential issue | 🟠 MajorValidate the env-backed voice settings at startup.
These values come straight from environment/config, but invalid inputs like
sample_rate <= 0,audio_channels < 1,vad_threshold > 1, or non-positive session limits will currently slip through until the voice pipeline is already starting.🛡️ Suggested validation
class VoiceSettings(BaseSettings): @@ - sample_rate: int = 16000 - audio_channels: int = 1 + sample_rate: int = Field(default=16000, gt=0) + audio_channels: int = Field(default=1, gt=0) @@ - vad_threshold: float = 0.5 + vad_threshold: float = Field(default=0.5, ge=0.0, le=1.0) @@ - session_timeout: int = 300 # seconds (5 min) - max_concurrent_sessions: int = 10 + session_timeout: int = Field(default=300, gt=0) # seconds (5 min) + max_concurrent_sessions: int = Field(default=10, gt=0) @@ - redis_session_ttl: int = 300 # seconds, TTL for session keys in Redis + redis_session_ttl: int = Field(default=300, gt=0) # secondsAs per coding guidelines "Validate all external input and use type hints for input validation in Python files".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bindu/settings.py` around lines 1038 - 1070, Add explicit startup validation for the voice/session settings defined in settings.py: implement a validator function (e.g., validate_voice_settings or Pydantic field validators on the Settings model) that checks sample_rate > 0, audio_channels >= 1, audio_encoding is one of the expected encodings, 0.0 <= vad_threshold <= 1.0, session_timeout > 0, max_concurrent_sessions > 0, session_backend is in {"memory","redis"} and if session_backend == "redis" then redis_url is non-empty and well-formed; raise clear ValueError/ConfigError with descriptive messages on invalid values and ensure this validator runs at startup (call it from the module init or Settings.__post_init__/model validation) so misconfiguration fails fast.tests/unit/extensions/voice/test_voice_endpoints.py-216-226 (1)
216-226:⚠️ Potential issue | 🟠 MajorThis lock regression test doesn't prove serialization yet.
await_count == 2passes whether the sends were serialized or concurrent, so the ASGI concurrent-send bug can still slip through.🧪 Stronger serialization check
websocket = AsyncMock() send_lock = asyncio.Lock() + call_order: list[str] = [] + first_send_entered = asyncio.Event() + release_first_send = asyncio.Event() + + async def send_text(payload: str) -> None: + call_order.append(payload) + if len(call_order) == 1: + first_send_entered.set() + await release_first_send.wait() + + websocket.send_text.side_effect = send_text - await asyncio.gather( - _send_json(websocket, {"type": "one"}, send_lock), - _send_json(websocket, {"type": "two"}, send_lock), - ) + first = asyncio.create_task(_send_json(websocket, {"type": "one"}, send_lock)) + await first_send_entered.wait() + second = asyncio.create_task(_send_json(websocket, {"type": "two"}, send_lock)) + await asyncio.sleep(0) + assert len(call_order) == 1 + release_first_send.set() + await asyncio.gather(first, second) assert websocket.send_text.await_count == 2🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/extensions/voice/test_voice_endpoints.py` around lines 216 - 226, The test test_send_json_uses_lock_when_provided currently only asserts websocket.send_text.await_count == 2, which doesn't prove serialization; modify the test to instrument the AsyncMock send_text to record start and end timestamps or use asyncio.Events to force one send to delay and assert the second send does not start until the first completes when calling _send_json with send_lock. Specifically, wrap websocket.send_text (used by _send_json) with a coroutine that logs start/end (or awaits an Event) so you can assert ordering (end of first <= start of second) or that the second awaited the Event released by the first, proving serialization under send_lock.frontend/src/lib/stores/voice.ts-33-57 (1)
33-57:⚠️ Potential issue | 🟠 MajorIgnore late events from stale
VoiceClientinstances.These callbacks always mutate the singleton stores, even after restart/shutdown. Because
stopSession()errors are swallowed, a half-closed client can still push transcripts, audio, or error state into the next session. Gate each callback on the currently active instance, or clear the old handlers before stopping it.Minimal guard
- client = new VoiceClient(); - client.onTranscript = appendTranscript; - client.onStateChange = (state) => { - voiceState.set(state); - }; - client.onAgentAudio = (audioData) => { - latestAgentAudio.set(audioData); - }; - client.onError = (message) => { - voiceError.set(message); - voiceState.set('error'); - }; + const nextClient = new VoiceClient(); + client = nextClient; + nextClient.onTranscript = (event) => { + if (client !== nextClient) return; + appendTranscript(event); + }; + nextClient.onStateChange = (state) => { + if (client !== nextClient) return; + voiceState.set(state); + }; + nextClient.onAgentAudio = (audioData) => { + if (client !== nextClient) return; + latestAgentAudio.set(audioData); + }; + nextClient.onError = (message) => { + if (client !== nextClient) return; + voiceError.set(message); + voiceState.set('error'); + };Also applies to: 78-89
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@frontend/src/lib/stores/voice.ts` around lines 33 - 57, The callbacks from stale VoiceClient instances are still mutating singleton stores after restart; fix by ensuring handlers are only applied for the active instance and cleared on the old one: before awaiting existingClient.stopSession(), remove or noop its handlers (e.g., set existingClient.onTranscript = existingClient.onStateChange = existingClient.onAgentAudio = existingClient.onError = null) and when creating a new VoiceClient capture a local identifier (const thisClient = new VoiceClient(); client = thisClient) and wrap each handler (onTranscript/onStateChange/onAgentAudio/onError) to first check client === thisClient (or that the handler is still the active one) before calling voiceState.set / latestAgentAudio.set / voiceError.set to prevent stale instances from mutating stores.bindu/server/endpoints/voice_endpoints.py-55-62 (1)
55-62:⚠️ Potential issue | 🟠 MajorDifferentiate an empty body from an invalid start payload.
This broad
except Exception: passturns malformed JSON into a fresh session and forwards anycontext_idstring unchecked. Empty body is fine, but malformed JSON or an invalidcontext_idshould return 400 beforecreate_session(). As per coding guidelines, "Validate all external input and use type hints for input validation in Python files".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bindu/server/endpoints/voice_endpoints.py` around lines 55 - 62, The handler currently swallows all exceptions when parsing request.json() and accepts any context_id string; change this to explicitly treat an empty body as acceptable but return HTTP 400 for malformed JSON or invalid context_id: call await request.json() inside a try that only catches JSONDecodeError (or starlette.requests.BadRequest) and return 400 on parse errors, then if the parsed body is a dict and contains "context_id" validate it with uuid.UUID(...) (or an explicit UUID validator) and return 400 if invalid; only after successful validation proceed to set context_id and call create_session(). Ensure the handler signature and validation use type hints for the request/body where applicable.bindu/server/endpoints/voice_endpoints.py-69-81 (1)
69-81:⚠️ Potential issue | 🟠 MajorDon’t return a proxy-fragile absolute
ws_url.Building the socket URL from
request.urlstill loses deployment details likeroot_pathand proxy rewriting, so clients behind a reverse proxy can connect to the wrong endpoint. Returning a relative websocket path (or justsession_id) lets the client derive the finalws:URL from its configured API base.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bindu/server/endpoints/voice_endpoints.py` around lines 69 - 81, The endpoint currently constructs an absolute WebSocket URL using request.url (scheme/hostname/port) which breaks behind proxies and ignores root_path; instead return a proxy-safe value such as the relative WebSocket path or just the session identifier so the client can derive the final ws:// URL from its configured API base. Modify the code that builds ws_url (remove usage of request.url.scheme, request.url.hostname, request.url.port and host fallback logic) and return either f"/ws/voice/{session.id}" or simply session.id from the handler so callers can compose the full WebSocket URL correctly; reference the existing session.id and the current ws_url construction code to locate and replace the logic.bindu/server/endpoints/voice_endpoints.py-548-557 (1)
548-557:⚠️ Potential issue | 🟠 MajorTrack and cancel pending send tasks in the finally block.
loop.create_task()creates fire-and-forget tasks. If_send_jsontries to send after the WebSocket closes (in the finally block), exceptions are silently dropped since they occur in detached background tasks. Store created tasks in a set and cancel them during cleanup:pending_sends = set() # In _try_send_json: task = loop.create_task(_send_json(websocket, data, send_lock)) pending_sends.add(task) task.add_done_callback(pending_sends.discard) # In finally block: for task in pending_sends: task.cancel() await asyncio.gather(*pending_sends, return_exceptions=True)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bindu/server/endpoints/voice_endpoints.py` around lines 548 - 557, The current _try_send_json creates fire-and-forget tasks with loop.create_task(_send_json(...)) which can leak and swallow exceptions; introduce a module-level or connection-scoped set named pending_sends to track created asyncio.Tasks (add the task and attach task.add_done_callback(pending_sends.discard) inside _try_send_json) and in the corresponding cleanup/finally block cancel all tasks from pending_sends and await them (using asyncio.gather with return_exceptions=True) so pending _send_json tasks are cancelled and awaited before teardown.
🟡 Minor comments (8)
examples/typescript-langchain-agent/quiz-agent.ts-115-117 (1)
115-117:⚠️ Potential issue | 🟡 MinorDon’t return raw provider error messages to end users.
Returning
error.messagecan leak internal/provider details and makes failures look like normal successful text responses.Proposed fix
- } catch (error: any) { - return `Error: ${error.message}`; + } catch (error: unknown) { + // Log internally with your logger if available. + return "Error: Failed to generate quiz. Please try again."; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/typescript-langchain-agent/quiz-agent.ts` around lines 115 - 117, The catch block currently returns raw provider error text from "catch (error: any)" (in quiz-agent.ts) which can leak internals; instead, log the full error to your internal logger (or console.error) referencing the catch block, then return a generic, user-facing message such as "An unexpected error occurred; please try again later." Alternatively rethrow a sanitized Error for centralized handling—do not include error.message in the response body.examples/typescript-langchain-agent/quiz-agent.ts-20-26 (1)
20-26:⚠️ Potential issue | 🟡 MinorMove
apiKeyto top-level for consistency with LangChain best practices.
apiKeyshould be placed at the top level of theChatOpenAIconstructor options, not nested underconfiguration. While both approaches work in@langchain/openaiv0.4.0, the official LangChain documentation recommends the top-level pattern.Proposed adjustment
const llm = new ChatOpenAI({ model: "openai/gpt-oss-120b", // same as your Python version temperature: 0.3, + apiKey: process.env.OPENROUTER_API_KEY, configuration: { baseURL: "https://openrouter.ai/api/v1", - apiKey: process.env.OPENROUTER_API_KEY, }, });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/typescript-langchain-agent/quiz-agent.ts` around lines 20 - 26, The ChatOpenAI instantiation places apiKey inside the nested configuration object; move apiKey to the top-level options to follow LangChain best practices — update the ChatOpenAI call (symbol: ChatOpenAI) so apiKey is a direct property alongside model and temperature (remove it from configuration), leaving configuration for other HTTP options like baseURL only.examples/voice-agent/.env.example-6-14 (1)
6-14:⚠️ Potential issue | 🟡 MinorReorder optional env keys to satisfy dotenv-linter.
dotenv-linterreportsUnorderedKeywarnings in this block; reordering avoids noisy lint output.🔧 Suggested reorder
# Optional tuning -VOICE__STT_MODEL=nova-3 -VOICE__STT_LANGUAGE=en -VOICE__TTS_VOICE_ID=21m00Tcm4TlvDq8ikWAM -VOICE__TTS_MODEL=eleven_turbo_v2_5 -VOICE__SAMPLE_RATE=16000 -VOICE__AUDIO_CHANNELS=1 -VOICE__AUDIO_ENCODING=linear16 -VOICE__ALLOW_INTERRUPTIONS=true -VOICE__SESSION_TIMEOUT=300 +VOICE__ALLOW_INTERRUPTIONS=true +VOICE__AUDIO_CHANNELS=1 +VOICE__AUDIO_ENCODING=linear16 +VOICE__SAMPLE_RATE=16000 +VOICE__SESSION_TIMEOUT=300 +VOICE__STT_LANGUAGE=en +VOICE__STT_MODEL=nova-3 +VOICE__TTS_MODEL=eleven_turbo_v2_5 +VOICE__TTS_VOICE_ID=21m00Tcm4TlvDq8ikWAM🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/voice-agent/.env.example` around lines 6 - 14, Reorder the block of environment variable keys so they are sorted to satisfy dotenv-linter (remove UnorderedKey warnings); specifically sort the keys such as VOICE__AUDIO_CHANNELS, VOICE__AUDIO_ENCODING, VOICE__ALLOW_INTERRUPTIONS, VOICE__SAMPLE_RATE, VOICE__SESSION_TIMEOUT, VOICE__STT_LANGUAGE, VOICE__STT_MODEL, VOICE__TTS_MODEL, and VOICE__TTS_VOICE_ID into alphabetical order (or the linter-expected order) while keeping their values unchanged so the variables (VOICE__STT_MODEL, VOICE__STT_LANGUAGE, VOICE__TTS_VOICE_ID, VOICE__TTS_MODEL, VOICE__SAMPLE_RATE, VOICE__AUDIO_CHANNELS, VOICE__AUDIO_ENCODING, VOICE__ALLOW_INTERRUPTIONS, VOICE__SESSION_TIMEOUT) are ordered consistently.frontend/src/lib/components/chat/ChatInput.svelte-174-176 (1)
174-176:⚠️ Potential issue | 🟡 MinorReset the native file input after a successful submit.
Only the Svelte state is cleared here. The hidden
<input type="file">keeps its previous value, so selecting the same file again will not firechangein most browsers. The error path already correctly resets it, but the success path should too for consistency.💡 Suggested change
// Clear files and input after successful submit + if (fileInputEl) fileInputEl.value = ""; files = []; value = "";🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@frontend/src/lib/components/chat/ChatInput.svelte` around lines 174 - 176, The success path clears Svelte state (files = [] and value = "") but doesn't reset the native file input element, so selecting the same file won't fire a change event; update the success branch of the submit handler in ChatInput.svelte to also reset the DOM file input (the same element the error path resets) by setting its value to an empty string (e.g., fileInput.value = "" or fileInputEl.value = "") or calling the same reset logic used in the error path so the native input is cleared consistently with files and value.bindu/utils/worker/messages.py-83-89 (1)
83-89:⚠️ Potential issue | 🟡 MinorUse strict base64 decoding here.
base64.b64decode()is permissive by default and will ignore non-base64 characters, so malformed uploads can still reach the PDF/DOCX/text parsers as corrupted bytes instead of hitting the failure path. Decode with validation after trimming whitespace.Suggested fix
- file_bytes = base64.b64decode(base64_data) + file_bytes = base64.b64decode(base64_data.strip(), validate=True)As per coding guidelines, "Validate all external input and use type hints for input validation in Python files".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bindu/utils/worker/messages.py` around lines 83 - 89, The base64 decoding is too permissive: trim whitespace from base64_data, then decode using strict validation (use base64.b64decode with validate=True) and catch binascii.Error (or ValueError) to raise/log a clear "invalid base64" error before passing file_bytes to PDF/DOCX/text parsers; update the handling around base64_data and file_bytes in this try block (and add appropriate type hints for the function parameter that supplies base64_data) so malformed uploads fail fast instead of producing corrupted bytes for downstream parsing.docs/VOICE_AGENT_PLAN.md-119-127 (1)
119-127:⚠️ Potential issue | 🟡 MinorUpdate the test path in the file matrix.
The plan currently points contributors to
tests/unit/server/endpoints/test_voice_endpoints.py, but this PR addstests/unit/extensions/voice/test_voice_endpoints.py.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/VOICE_AGENT_PLAN.md` around lines 119 - 127, Update the tests file matrix entry so it references the new test location: replace the path string `tests/unit/server/endpoints/test_voice_endpoints.py` with `tests/unit/extensions/voice/test_voice_endpoints.py` in the table under "Tests (5 files)" (the table row currently listing the REST + WebSocket endpoint tests).tests/unit/extensions/voice/test_voice_endpoints.py-73-76 (1)
73-76:⚠️ Potential issue | 🟡 MinorAvoid pinning the WebSocket URL to an absolute
ws://origin.This assertion bakes the proxy-unfriendly contract into the test suite. If the client derives the origin locally, or the endpoint switches to a relative path /
wss://, this test will fail for the right fix.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/extensions/voice/test_voice_endpoints.py` around lines 73 - 76, The test currently asserts body["ws_url"].startswith("ws://") which hard-codes an absolute ws:// origin; change the assertion in tests/unit/extensions/voice/test_voice_endpoints.py to accept either a websocket URL with scheme "ws" or "wss" or a relative path. For example, parse body["ws_url"] (or inspect its prefix) and assert that it either starts with "/" (relative path) or has a scheme of "ws" or "wss" (allowing both "ws://" and "wss://"); keep the existing assertions for "session_id" and "context_id" intact.docs/VOICE_AGENT_PLAN.md-16-58 (1)
16-58:⚠️ Potential issue | 🟡 MinorFix the markdownlint violations in the plan doc.
These fenced blocks still need explicit languages (
textworks well for diagrams/protocol snippets), and the unit-test table should be surrounded by blank lines so the doc passes lint cleanly.Also applies to: 413-418, 435-453, 536-548, 551-556, 618-632, 650-669, 678-686, 789-798
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/VOICE_AGENT_PLAN.md` around lines 16 - 58, The markdown has fenced code blocks and a unit-test table missing lint requirements: update each fenced block (e.g., the protocol/diagram blocks such as the large ASCII diagram shown) to include a language tag (use "text" for diagrams) like ```text, and ensure the unit-test table is surrounded by blank lines above and below so it is treated as a proper block; apply the same fixes to all other occurrences noted (lines referenced in the review) to clear markdownlint violations.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2cc17dac-ef09-4280-95b0-6d3c3f79ef8c
⛔ Files ignored due to path filters (4)
bindu/grpc/generated/agent_handler_pb2.pyis excluded by!**/generated/**bindu/grpc/generated/agent_handler_pb2.pyiis excluded by!**/generated/**bindu/grpc/generated/agent_handler_pb2_grpc.pyis excluded by!**/generated/**uv.lockis excluded by!**/*.lock
📒 Files selected for processing (97)
.github/workflows/ci.ymlREADME.bn.mdREADME.de.mdREADME.es.mdREADME.fr.mdREADME.hi.mdREADME.nl.mdREADME.ta.mdREADME.zh.mdbindu/common/protocol/types.pybindu/extensions/__init__.pybindu/extensions/voice/__init__.pybindu/extensions/voice/agent_bridge.pybindu/extensions/voice/audio_config.pybindu/extensions/voice/pipeline_builder.pybindu/extensions/voice/redis_session_manager.pybindu/extensions/voice/service_factory.pybindu/extensions/voice/session_factory.pybindu/extensions/voice/session_manager.pybindu/extensions/voice/voice_agent_extension.pybindu/penguin/bindufy.pybindu/server/applications.pybindu/server/endpoints/voice_endpoints.pybindu/server/metrics.pybindu/server/scheduler/memory_scheduler.pybindu/server/workers/base.pybindu/settings.pybindu/utils/__init__.pybindu/utils/capabilities.pybindu/utils/logging.pybindu/utils/retry.pybindu/utils/task_telemetry.pybindu/utils/worker/messages.pybindu/utils/worker/parts.pydocs/VOICE_AGENT_PLAN.mdexamples/README.mdexamples/ag2_research_team/skills/research-team-skill/skill.yamlexamples/beginner/README.mdexamples/beginner/motivational_agent.pyexamples/document-analyzer/document_analyzer.pyexamples/typescript-langchain-agent/quiz-agent.tsexamples/typescript-langchain-quiz-agent/.env.exampleexamples/typescript-langchain-quiz-agent/quiz-agent.tsexamples/voice-agent/.env.exampleexamples/voice-agent/README.mdexamples/voice-agent/main.pyfrontend/.env.examplefrontend/.gitignorefrontend/src/lib/components/ShareConversationModal.sveltefrontend/src/lib/components/chat/ChatInput.sveltefrontend/src/lib/components/chat/ChatMessage.sveltefrontend/src/lib/components/chat/ChatWindow.sveltefrontend/src/lib/components/chat/ContextList.sveltefrontend/src/lib/components/voice/LiveTranscript.sveltefrontend/src/lib/components/voice/VoiceCallButton.sveltefrontend/src/lib/components/voice/VoiceCallPanel.sveltefrontend/src/lib/constants/mime.tsfrontend/src/lib/jobs/refresh-conversation-stats.tsfrontend/src/lib/migrations/lock.tsfrontend/src/lib/migrations/migrations.tsfrontend/src/lib/migrations/routines/02-update-assistants-models.tsfrontend/src/lib/migrations/routines/10-update-reports-assistantid.tsfrontend/src/lib/server/config.tsfrontend/src/lib/server/database.tsfrontend/src/lib/server/files/downloadFile.tsfrontend/src/lib/server/files/uploadFile.tsfrontend/src/lib/server/models.tsfrontend/src/lib/services/voice-client.tsfrontend/src/lib/stores/chat.tsfrontend/src/lib/stores/voice.tsfrontend/src/lib/types/ConvSidebar.tsfrontend/src/lib/types/Model.tsfrontend/src/lib/types/Session.tsfrontend/src/lib/utils/agentMessageHandler.tsfrontend/src/lib/utils/tree/addChildren.spec.tsfrontend/src/lib/utils/tree/addSibling.spec.tsfrontend/src/lib/utils/tree/buildSubtree.spec.tsfrontend/src/lib/utils/tree/convertLegacyConversation.spec.tsfrontend/src/routes/+layout.sveltefrontend/src/routes/+page.sveltefrontend/src/routes/api/v2/conversations/[id]/message/[messageId]/+server.tsfrontend/src/routes/api/v2/export/+server.tsfrontend/src/routes/conversation/[id]/+server.tsfrontend/src/routes/settings/(nav)/authentication/+page.sveltefrontend/src/routes/settings/(nav)/negotiation/+page.sveltepyproject.tomltests/conftest_stubs.pytests/unit/extensions/__init__.pytests/unit/extensions/voice/__init__.pytests/unit/extensions/voice/test_agent_bridge.pytests/unit/extensions/voice/test_service_factory.pytests/unit/extensions/voice/test_session_manager.pytests/unit/extensions/voice/test_voice_endpoints.pytests/unit/extensions/voice/test_voice_extension.pytests/unit/server/scheduler/test_memory_scheduler.pytests/unit/test_minimax_example.pytests/unit/utils/worker/test_messages.py
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (2)
bindu/server/endpoints/voice_endpoints.py (2)
57-62: Consider narrowing exception catch for body parsing.While the silent pass is intentional for empty/invalid bodies, catching
json.JSONDecodeErrorspecifically would be more precise and avoid masking unexpected errors.♻️ Proposed refactor
try: body = await request.json() if isinstance(body, dict) and "context_id" in body: context_id = str(body["context_id"]) - except Exception: - pass # empty body is fine, we'll generate a new context_id + except (json.JSONDecodeError, ValueError): + pass # empty or invalid body is fine, we'll generate a new context_id🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bindu/server/endpoints/voice_endpoints.py` around lines 57 - 62, Narrow the broad except: when parsing the request body with await request.json(), catch json.JSONDecodeError (and optionally aiohttp.ContentTypeError if using aiohttp) instead of a bare Exception so we don't mask other errors; keep the same behavior of treating empty/invalid JSON as "generate a new context_id" and still set context_id from body["context_id"] when present. Ensure you import json (and aiohttp.ContentTypeError if used) and update the try/except around request.json() and the context_id assignment.
488-495: Consider reusing httpx client for reduced latency.Creating a new
AsyncClientfor each STT/TTS call adds connection establishment overhead. For real-time voice with frequent transcription calls (~1s intervals), consider a session-scoped or module-level client with connection pooling.♻️ Example approach
# Module-level shared client (initialize on first use or app startup) _http_client: httpx.AsyncClient | None = None async def _get_http_client() -> httpx.AsyncClient: global _http_client if _http_client is None: _http_client = httpx.AsyncClient(timeout=30.0) return _http_clientOr pass a client instance through the WebSocket handler lifecycle.
Also applies to: 525-530
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bindu/server/endpoints/voice_endpoints.py` around lines 488 - 495, The TTS/STT handlers currently create a new httpx.AsyncClient per request (the try block that async with httpx.AsyncClient(timeout=30.0) as client), causing connection overhead; replace that pattern with a shared, module-level or lifecycle-scoped AsyncClient (e.g., implement a single _http_client and accessor _get_http_client or accept a client instance into the WebSocket/handler) and use that client for requests instead of creating a new one in the try/except, and update both occurrences (the AsyncClient creation at the shown try block and the similar block referenced at 525-530) so connection pooling is reused and the client is closed on app shutdown.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bindu/extensions/voice/redis_session_manager.py`:
- Around line 89-93: The except block in redis_session_manager.py that catches
redis.RedisError logs the error and re-raises a ConnectionError but doesn't
chain the original exception; update the except handler inside the
RedisSessionManager connection logic (the block that uses logger.error and
raises ConnectionError referencing self.redis_url) to re-raise the new
ConnectionError using exception chaining (raise ConnectionError(... ) from e) so
the original traceback is preserved.
- Around line 53-74: The RedisSessionManager __init__ is missing initialization
of the asyncio lock used elsewhere; add initialization of self._lock =
asyncio.Lock() inside the __init__ of the Redis session manager (alongside
setting self._redis_client, self._cleanup_task, and
self._create_session_script_sha) so methods end_session, update_state, and
_expire_timed_out_sessions can safely await the lock without causing
AttributeError.
- Around line 248-277: The async `@property` active_count is invalid; change
RedisVoiceSessionManager.active_count to an async method (e.g., async def
get_active_count(self) -> int) by removing the `@property` decorator and renaming
the method, and update the base/sync VoiceSessionManager to declare a compatible
async signature (or provide an async abstract get_active_count) so
implementations match; also update all callers to await
session_manager.get_active_count() instead of accessing
session_manager.active_count.
In `@bindu/server/endpoints/voice_endpoints.py`:
- Around line 581-591: The fire-and-forget created in _try_send_json uses
loop.create_task(...) without keeping a reference so the task can be
garbage-collected and the send dropped; fix by creating a module-level set
(e.g., _BACKGROUND_TASKS) to hold references to tasks created by _try_send_json,
add each created task to that set, and attach a done callback that removes the
task from the set when finished and handles/logs exceptions; reference the
functions _try_send_json and _send_json when making these changes.
- Around line 504-507: The CI failed due to trailing whitespace on the line with
the empty-audio check in voice_endpoints.py (the block that checks pcm_bytes and
calls logger.debug("Empty audio buffer, skipping transcription")); remove the
trailing whitespace characters at the end of that line (and any other trailing
spaces in the file), save the file, and commit the change so the pre-commit hook
passes.
---
Nitpick comments:
In `@bindu/server/endpoints/voice_endpoints.py`:
- Around line 57-62: Narrow the broad except: when parsing the request body with
await request.json(), catch json.JSONDecodeError (and optionally
aiohttp.ContentTypeError if using aiohttp) instead of a bare Exception so we
don't mask other errors; keep the same behavior of treating empty/invalid JSON
as "generate a new context_id" and still set context_id from body["context_id"]
when present. Ensure you import json (and aiohttp.ContentTypeError if used) and
update the try/except around request.json() and the context_id assignment.
- Around line 488-495: The TTS/STT handlers currently create a new
httpx.AsyncClient per request (the try block that async with
httpx.AsyncClient(timeout=30.0) as client), causing connection overhead; replace
that pattern with a shared, module-level or lifecycle-scoped AsyncClient (e.g.,
implement a single _http_client and accessor _get_http_client or accept a client
instance into the WebSocket/handler) and use that client for requests instead of
creating a new one in the try/except, and update both occurrences (the
AsyncClient creation at the shown try block and the similar block referenced at
525-530) so connection pooling is reused and the client is closed on app
shutdown.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 90e932fb-3d8b-483b-93f0-5526868a1925
📒 Files selected for processing (4)
bindu/extensions/voice/redis_session_manager.pybindu/server/endpoints/voice_endpoints.pyfrontend/src/lib/buildPrompt.tsfrontend/src/lib/server/models.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- frontend/src/lib/server/models.ts
| def __init__( | ||
| self, | ||
| redis_url: str, | ||
| max_sessions: int = 10, | ||
| session_timeout: int = DEFAULT_SESSION_TTL, | ||
| redis_session_ttl: int = DEFAULT_SESSION_TTL, | ||
| ): | ||
| """Initialize the Redis session manager. | ||
|
|
||
| Args: | ||
| redis_url: Redis connection URL | ||
| max_sessions: Maximum concurrent sessions allowed | ||
| session_timeout: Session timeout in seconds (for cleanup) | ||
| redis_session_ttl: TTL for Redis keys in seconds | ||
| """ | ||
| self.redis_url = redis_url | ||
| self._max_sessions = max_sessions | ||
| self._session_timeout = session_timeout | ||
| self._redis_session_ttl = redis_session_ttl | ||
| self._redis_client: redis.Redis | None = None | ||
| self._cleanup_task: asyncio.Task[None] | None = None | ||
| self._create_session_script_sha: str | None = None |
There was a problem hiding this comment.
Missing _lock attribute causes AttributeError at runtime.
The class uses self._lock in end_session() (line 202), update_state() (line 234), and _expire_timed_out_sessions() (line 316), but _lock is never initialized in __init__. The base VoiceSessionManager initializes self._lock = asyncio.Lock() (see context snippet 1), but this class does not.
🐛 Proposed fix
self._redis_session_ttl = redis_session_ttl
self._redis_client: redis.Redis | None = None
self._cleanup_task: asyncio.Task[None] | None = None
self._create_session_script_sha: str | None = None
+ self._lock = asyncio.Lock()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bindu/extensions/voice/redis_session_manager.py` around lines 53 - 74, The
RedisSessionManager __init__ is missing initialization of the asyncio lock used
elsewhere; add initialization of self._lock = asyncio.Lock() inside the __init__
of the Redis session manager (alongside setting self._redis_client,
self._cleanup_task, and self._create_session_script_sha) so methods end_session,
update_state, and _expire_timed_out_sessions can safely await the lock without
causing AttributeError.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
bindu/server/endpoints/voice_endpoints.py (1)
584-593:⚠️ Potential issue | 🟠 MajorRetain task references for callback-triggered sends.
loop.create_task(...)without a strong reference can drop send tasks before completion, causing silent message loss.💡 Proposed fix
+_background_tasks: set[asyncio.Task[None]] = set() + +def _track_background_task(task: asyncio.Task[None]) -> None: + _background_tasks.add(task) + def _done(t: asyncio.Task[None]) -> None: + _background_tasks.discard(t) + try: + t.result() + except Exception: + logger.exception("Background websocket send task failed") + task.add_done_callback(_done) + def _try_send_json( websocket: WebSocket, data: dict, send_lock: asyncio.Lock, ) -> None: """Enqueue a JSON send (safe to call from sync callbacks).""" try: loop = asyncio.get_running_loop() - loop.create_task(_send_json(websocket, data, send_lock)) + task = loop.create_task(_send_json(websocket, data, send_lock)) + _track_background_task(task) except RuntimeError: pass#!/bin/bash # Verify create_task call sites in this file and inspect whether return values are retained. rg -n -C2 'create_task\(' bindu/server/endpoints/voice_endpoints.py🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bindu/server/endpoints/voice_endpoints.py` around lines 584 - 593, The _try_send_json helper creates background tasks via loop.create_task(_send_json(...)) but does not retain a reference which can allow the task to be garbage-collected and drop messages; change _try_send_json to store the returned Task in a short-lived container and remove it when finished (e.g., maintain a set of pending tasks and add task.add_done_callback(lambda t: pending.discard(t))) so tasks started from callback contexts are strongly referenced until completion; reference the symbols _try_send_json, _send_json, WebSocket and send_lock when locating where to create the pending-tasks set (module-level or attached to the websocket/session object) and ensure thread-safety around adding/removing tasks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bindu/server/endpoints/voice_endpoints.py`:
- Around line 218-219: The code currently hardcodes tuning values (e.g.,
chunk_bytes and overlap_bytes) and other configuration like provider URLs and
HTTP timeouts inside voice_endpoints.py; extract these into
bindu.settings.app_settings.voice (add fields such as sample_rate,
audio_channels, chunk_overlap_fraction or overlap_bytes, chunk_throttle_ms,
provider_urls, http_timeout_seconds, etc.) and replace direct literals/usages
(notably the variables chunk_bytes and overlap_bytes and the hardcoded values at
the other occurrences referenced in the review: ~310-313, ~476-477, ~492-493,
~512-513, ~529-531) to read from app_settings.voice instead so all tuning, URLs
and timeouts are configurable via settings and not embedded in the module.
- Around line 57-62: The code currently swallows all JSON parse errors when
reading request.json() and silently generates a new context_id; instead,
explicitly return HTTP 400 for malformed JSON while preserving the "empty body"
behavior. Replace the single await request.json() call with logic that first
reads the raw body (await request.body()), if the body is empty keep generating
a new context_id, otherwise attempt to parse via json.loads(...) and on
json.JSONDecodeError raise fastapi.HTTPException(status_code=400,
detail="Malformed JSON payload"); if parsing succeeds and the result is a dict
with "context_id", set context_id = str(body["context_id"]). Ensure you import
json and fastapi.HTTPException and update the code around request.json() and
context_id handling accordingly.
---
Duplicate comments:
In `@bindu/server/endpoints/voice_endpoints.py`:
- Around line 584-593: The _try_send_json helper creates background tasks via
loop.create_task(_send_json(...)) but does not retain a reference which can
allow the task to be garbage-collected and drop messages; change _try_send_json
to store the returned Task in a short-lived container and remove it when
finished (e.g., maintain a set of pending tasks and add
task.add_done_callback(lambda t: pending.discard(t))) so tasks started from
callback contexts are strongly referenced until completion; reference the
symbols _try_send_json, _send_json, WebSocket and send_lock when locating where
to create the pending-tasks set (module-level or attached to the
websocket/session object) and ensure thread-safety around adding/removing tasks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 21703d2e-989a-4ebb-a5ee-4b02a03c473d
📒 Files selected for processing (1)
bindu/server/endpoints/voice_endpoints.py
|
Hey @chandan-1427 |
|
Hi @Co-vengers , Amazing work on the latest commits! The frontend Web Audio API implementation is great. While reviewing the new code, I noticed the CI pipeline crashed before running any tests due to a few strict pre-commit linters, and I spotted one tiny edge case in the new backend history logic. Could you patch these final 4 things?
Fix: Ensure messages are deleted in pairs (user + assistant) so the array always starts with a user message. CodeRabbit actually left a great inline suggestion for this using turns_to_drop = max(1, (overflow + 1) // 2). Looking forward! |
3b240cc to
3be6819
Compare
|
Hi @chandan-1427 |
|
@Co-vengers great work on these updates. I ran this locally and caught one issue in voice_endpoints.py (_transcribe_pcm_buffer). |
Sure I'll fix those issues. |
|
@chandan-1427 I have fixed the issue and also made some necessary changes in integrating pipecat pipeline. |
|
@chandan-1427 Let me know if any further fixes required after review. |
|
@Co-vengers Tested this locally and found two blocking issues in VoiceClient.ts (startAudioStreaming): |
|
@chandan-1427 I have fixed VoiceClient.ts to address both issues: Removed the manual resampler and now rely on new AudioContext({ sampleRate: 16000 }) for browser-side resampling. |
636312f to
df5dedb
Compare
…Bindu#353) Worker accessed task_operation["_current_span"] but scheduler now sends primitive trace_id/span_id strings. Add _reconstruct_span() helper to rebuild a NonRecordingSpan from hex-encoded IDs with graceful fallback.
Replace math.inf buffer size with a constant of 100 to prevent unbounded memory growth while still allowing task enqueue before the worker loop is ready.
Add SpanContext, TraceFlags, NonRecordingSpan, and INVALID_SPAN_CONTEXT mocks. Register opentelemetry.trace.span submodule so worker imports resolve in the test environment.
df5dedb to
fcf2e4e
Compare
|
@Co-vengers this has too many files to check and evaluate could you raise separate prs for dedicated issue like a package pr, example agent pr, documentation pr and codes pr. It will be more efficient this way. Could you please do this? |
|
@chandan-1427 I'll split if its necessary. Just let me know how each split should look. It must be standalone. Or each split must include changes of prev split as creating new branch for each section would by default include those files and aggregate into itself. |
|
Hey @Co-vengers , thanks for being open to splitting this up! |
|
@chandan-1427 I have raised PRs splitting into code, examples, and docs for review:
These are for review purposes only — they may include merge conflicts. Once reviewed, all changes will be aggregated back into this branch. Hope this helps with the review! |
Voice Agent Extension — Progress & Documentation PR
Overview
This PR introduces the initial implementation of the Voice Agent Extension for Bindu, enabling real-time voice conversations between users and agents. The extension integrates backend, frontend, and testing components, following the architecture and plan outlined in
docs/VOICE_AGENT_PLAN.md.What’s Included
Backend
bindu/extensions/voice/with:__init__.py,voice_agent_extension.py,service_factory.py,pipeline_builder.py,session_manager.py,agent_bridge.py,audio_config.pybindu/server/endpoints/voice_endpoints.py(REST + WebSocket)bindu/settings.pyupdated withVoiceSettingsbindu/server/applications.pyupdated for conditional voice route registration and session managerbindu/utils/capabilities.pyupdated for voice extension helpersbindu/penguin/bindufy.pyupdated to accept voice config and add the extensionFrontend
frontend/src/lib/services/voice-client.ts: WebSocket client, audio capture/playbackfrontend/src/lib/stores/voice.ts: Svelte stores for voice state and transcriptsfrontend/src/lib/components/voice/VoiceCallPanel.svelte,VoiceCallButton.svelte,LiveTranscript.svelte: UI components for voice sessionTests
tests/unit/extensions/voice/test_voice_extension.pytests/unit/extensions/voice/test_session_manager.pytests/unit/extensions/voice/test_service_factory.pytests/unit/extensions/voice/test_agent_bridge.pytests/unit/extensions/voice/test_voice_endpoints.pyExamples & Docs
examples/voice-agent/main.py,.env.example, andREADME.mddocs/VOICE_AGENT_PLAN.md(implementation plan)Current Progress
docs/VOICE.md) are planned.How to Test
pipecat-ai[deepgram,elevenlabs,silero]andwebsocketsare installed (seepyproject.tomlvoice group).VOICE__STT_API_KEY,VOICE__TTS_API_KEY(see.env.example)uv run pytest tests/unit/extensions/voice/ -vGET /tasks/get.Next Steps & Improvements
docs/VOICE_AGENT_PLAN.md)docs/VOICE.md)uv run pre-commit run --all-filesReferences
Contributors:
For questions or feedback, please comment on this PR.
Summary by CodeRabbit
New Features
Documentation
Tests