Skip to content

feat(nodes): add audio_transcribe_live source node for live microphone STT#979

Open
ali-amjad52114 wants to merge 1 commit into
rocketride-org:developfrom
ali-amjad52114:feat/audio-transcribe-live
Open

feat(nodes): add audio_transcribe_live source node for live microphone STT#979
ali-amjad52114 wants to merge 1 commit into
rocketride-org:developfrom
ali-amjad52114:feat/audio-transcribe-live

Conversation

@ali-amjad52114
Copy link
Copy Markdown

@ali-amjad52114 ali-amjad52114 commented May 23, 2026

Summary

  • Adds audio_transcribe_live source node that captures microphone audio and emits rolling partial transcription on the text lane using local faster-whisper
  • Includes example pipeline (pipelines/live_stt_to_text.pipe) chaining live STT to response_text
  • Updates node catalog in docs/README-nodes.md

Test plan

  • Start RocketRide engine with the new node installed
  • Run pipelines/live_stt_to_text.pipe or a custom pipeline with audio_transcribe_live as source
  • Verify microphone capture starts and partial text appears on the text lane
  • Confirm downstream nodes (e.g. response_text) receive transcribed output
  • Stop the pipeline and verify clean session teardown

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced live speech-to-text node for real-time microphone transcription with streaming partial results. Includes configuration presets and support for language/model selection.
    • Added "Live Speech To Text" pipeline demonstrating the new transcription capability.
  • Documentation

    • Updated node reference documentation to include the new live transcription node.

Review Change Stack

…e STT

Captures microphone audio and emits rolling partial transcription on the text lane using local faster-whisper, enabling chainable live STT pipelines.

Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions github-actions Bot added docs Documentation module:nodes Python pipeline nodes labels May 23, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 23, 2026

📝 Walkthrough

Walkthrough

This PR introduces a complete live speech-to-text transcription node using Whisper and microphone audio capture. It includes an audio streaming engine with buffering and incremental text emission, a global Whisper model manager, endpoint orchestration for pipeline integration, configuration schema, dependencies, and example usage.

Changes

Live Speech-to-Text Node Implementation

Layer / File(s) Summary
Audio streaming and incremental transcription
nodes/src/nodes/audio_transcribe_live/live_transcribe.py
LiveTranscriber captures microphone audio via sounddevice.InputStream, maintains a rolling buffer window, periodically transcribes buffered samples, and streams partial text deltas to a pipe using phrase-tracking logic; emits newlines after silence timeouts and a final pass on shutdown.
Global Whisper model and transcription interface
nodes/src/nodes/audio_transcribe_live/IGlobal.py
IGlobal manages a module-level singleton Whisper instance with configuration loading (normalizing connection parameters), model initialization (with language, compute type, and VAD settings), thread-safe transcription via lock-protected transcribe() calls, and dual lifecycle paths (beginGlobal() for full startup and ensure_initialized() for endpoint-only startup).
Endpoint lifecycle and session orchestration
nodes/src/nodes/audio_transcribe_live/IEndpoint.py
IEndpoint resolves or creates IGlobal, builds LiveTranscriber from configuration, manages session lifecycle via beginEndpoint()/endEndpoint(), and implements scanObjects() to execute transcription with synthetic session tracking, status updates, and exception handling; delegates audio capture to LiveTranscriber.run().
Service configuration and dependencies
nodes/src/nodes/audio_transcribe_live/services.json, nodes/src/nodes/audio_transcribe_live/requirements.txt
services.json defines the "Live Transcribe" pipeline source with metadata, output lane wiring, preset configurations (default/tiny/small), user-editable fields for model/language/timing/device, and conditional VAD parameters; requirements.txt declares Whisper, audio, and ML runtime dependencies.
Package structure and documentation
nodes/src/nodes/audio_transcribe_live/__init__.py, nodes/src/nodes/audio_transcribe_live/IInstance.py, docs/README-nodes.md, pipelines/live_stt_to_text.pipe
Package exports IEndpoint, IGlobal, and IInstance (framework-required stub); documentation adds the node to AI/Analysis and Media sections; example pipeline wires audio capture into text response consumer.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

module:nodes, docs, module:ai

Suggested reviewers

  • jmaionchi
  • stepmikhaylov
  • Rod-Christensen

Poem

🎙️ A rabbit whispers to the stream,
Capturing speech in rolling dreams—
Whisper listens, buffers grow,
Deltas flow where partial words go,
Live transcription, phrase by phrase! 🐰

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.87% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(nodes): add audio_transcribe_live source node for live microphone STT' accurately and specifically describes the main change—adding a new audio_transcribe_live node for live microphone speech-to-text.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

No description provided.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nodes/src/nodes/audio_transcribe_live/IGlobal.py`:
- Around line 122-123: The f-string constructing the debug message for "Live
transcribe" uses double quotes inside the config.get calls; update the
expressions in that f-string so the config keys use single quotes (e.g., change
config.get("chunk_interval", 1.5) and config.get("window_seconds", 4) to use
single-quoted keys) in the Live transcribe message construction found in
IGlobal.py so it adheres to the project's single-quote convention.

In `@nodes/src/nodes/audio_transcribe_live/IInstance.py`:
- Around line 27-28: Add a one-line PEP 257 docstring to the stub class
IInstance describing its purpose (e.g., 'Interface for live audio transcription
instance.') using single quotes and a triple-quoted string directly under the
class declaration; ensure the docstring is concise, follows PEP 257 style, and
that the file is formatted/linted (ruff) for Python 3.10+ after the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: bbe5752c-50da-4cda-a6fa-d24685f48bcf

📥 Commits

Reviewing files that changed from the base of the PR and between 487e20c and c62bbc4.

📒 Files selected for processing (9)
  • docs/README-nodes.md
  • nodes/src/nodes/audio_transcribe_live/IEndpoint.py
  • nodes/src/nodes/audio_transcribe_live/IGlobal.py
  • nodes/src/nodes/audio_transcribe_live/IInstance.py
  • nodes/src/nodes/audio_transcribe_live/__init__.py
  • nodes/src/nodes/audio_transcribe_live/live_transcribe.py
  • nodes/src/nodes/audio_transcribe_live/requirements.txt
  • nodes/src/nodes/audio_transcribe_live/services.json
  • pipelines/live_stt_to_text.pipe

Comment on lines +122 to +123
f' Live transcribe: model={model_name}, language={language}, '
f'interval={config.get("chunk_interval", 1.5)}s, window={config.get("window_seconds", 4)}s'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use single quotes for the config.get(...) keys in this debug message.

This currently mixes in double-quoted regular string literals and will violate the node Python quote convention.

Proposed fix
         debug(
             f'    Live transcribe: model={model_name}, language={language}, '
-            f'interval={config.get("chunk_interval", 1.5)}s, window={config.get("window_seconds", 4)}s'
+            f'interval={config.get('chunk_interval', 1.5)}s, window={config.get('window_seconds', 4)}s'
         )

As per coding guidelines, nodes/**/*.py: use single quotes, ruff for linting/formatting, PEP 257 docstrings, target Python 3.10+.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
f' Live transcribe: model={model_name}, language={language}, '
f'interval={config.get("chunk_interval", 1.5)}s, window={config.get("window_seconds", 4)}s'
debug(
f' Live transcribe: model={model_name}, language={language}, '
f'interval={config.get(\'chunk_interval\', 1.5)}s, window={config.get(\'window_seconds\', 4)}s'
)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nodes/src/nodes/audio_transcribe_live/IGlobal.py` around lines 122 - 123, The
f-string constructing the debug message for "Live transcribe" uses double quotes
inside the config.get calls; update the expressions in that f-string so the
config keys use single quotes (e.g., change config.get("chunk_interval", 1.5)
and config.get("window_seconds", 4) to use single-quoted keys) in the Live
transcribe message construction found in IGlobal.py so it adheres to the
project's single-quote convention.

Comment on lines +27 to +28
class IInstance(IInstanceBase):
pass
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a short class docstring for the stub IInstance.

The stub is fine, but this class is currently undocumented in a nodes/**/*.py module.

Proposed fix
 class IInstance(IInstanceBase):
+    """Node-local instance stub required by the pipeline framework."""
     pass

As per coding guidelines, nodes/**/*.py: use single quotes, ruff for linting/formatting, PEP 257 docstrings, target Python 3.10+.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class IInstance(IInstanceBase):
pass
class IInstance(IInstanceBase):
"""Node-local instance stub required by the pipeline framework."""
pass
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nodes/src/nodes/audio_transcribe_live/IInstance.py` around lines 27 - 28, Add
a one-line PEP 257 docstring to the stub class IInstance describing its purpose
(e.g., 'Interface for live audio transcription instance.') using single quotes
and a triple-quoted string directly under the class declaration; ensure the
docstring is concise, follows PEP 257 style, and that the file is
formatted/linted (ruff) for Python 3.10+ after the change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Documentation module:nodes Python pipeline nodes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant