RealtimeClaw

Wyoming Protocol to Realtime Speech-to-Speech API bridge for Home Assistant

RealtimeClaw connects Home Assistant voice pipelines to xAI, OpenAI, and Inworld Realtime Speech-to-Speech APIs over WebSocket. Instead of the traditional STT → text-LLM → TTS pipeline, audio goes directly to a speech-to-speech model — delivering sub-1-second voice response latency with local speaker identification and security-filtered tool routing.

Works standalone with Home Assistant, or paired with OpenClaw for personality, persistent memory, and advanced tool skills.

Architecture

flowchart TD
    VPE["Voice PE (ESPHome)"] -- "PCM 16kHz" --> HA["Home Assistant"]
    HA -- "Wyoming" --> RC["RealtimeClaw :10300"]
    RC -- "WebSocket" --> API["Realtime API (xAI / OpenAI)"]
    API -- "audio + tool calls" --> RC

    RC --> Eagle["Eagle Speaker ID"]
    Eagle --> Router["Tool Router"]
    Router -- "direct" --> HA_API["HA REST API"]
    Router -- "reasoning" --> OC["OpenClaw Gateway"]
    Router -- "dangerous" --> Block["Blocked / Approval"]

All audio stays PCM 16 kHz S16_LE end to end — no resampling, no transcoding. One Realtime WebSocket connection is created per Wyoming session and torn down on disconnect.

Direct Voice PE Connection

Voice PE satellites can also bypass HA entirely and connect straight to RealtimeClaw using esphome-wyoming-client, a custom ESPHome component that speaks Wyoming Protocol over TCP:

flowchart TD
    VPE["Voice PE"]
    RC["RealtimeClaw :10300"]
    API["xAI Realtime API"]

    VPE -- "Wyoming TCP" --> RC
    RC -- "WebSocket" --> API
    API --> RC
    RC --> VPE

Prerequisites

Home Assistant with the Wyoming integration
An API key from a supported provider — xAI gives $25 free credit at console.x.ai
Optional: OpenClaw for personality, memory, and tool skills
Optional: Picovoice access key for speaker identification (free tier: 100 min/month)

Installation

Home Assistant Addon (recommended)

In HA go to Settings → Add-ons → Add-on Store
Click ⋮ (top right) → Repositories
Add: https://github.com/ufelmann/RealtimeClaw
Find RealtimeClaw in the store → Install
Configuration tab → set your xAI API Key
Start → check the Log tab

See addon/DOCS.md for full addon documentation.

Docker

git clone https://github.com/ufelmann/RealtimeClaw.git
cd RealtimeClaw
cp .env.example .env   # edit with your API key
docker compose up -d

From Source

git clone https://github.com/ufelmann/RealtimeClaw.git
cd RealtimeClaw
npm install
cp .env.example .env   # edit with your API key
npm run dev

Connect to Home Assistant

Go to Settings → Devices & Services → Add Integration → Wyoming Protocol
Enter the host IP where RealtimeClaw is running, port 10300
Assign the bridge to a Voice PE satellite

Features

Multi-Provider Support

All three providers implement the same OpenAI Realtime Protocol. Switch with a single environment variable:

Provider	`REALTIME_PROVIDER`	Default Voice	Notes
xAI	`xai` (default)	rex	PCM, PCMU, PCMA formats
OpenAI	`openai`	alloy	PCM only
Inworld	`inworld`	default	PCM only

Speaker Identification

Picovoice Eagle runs locally to identify speakers from voice audio — no cloud calls, no recordings leave the device.

sequenceDiagram
    participant VPE as Voice PE
    participant RC as RealtimeClaw
    participant Eagle as Eagle (local)
    participant API as Realtime API

    VPE->>RC: PCM audio
    RC->>Eagle: audio frames (parallel)
    RC->>API: audio stream
    Eagle-->>RC: speaker: alice (92%)
    RC->>RC: security level → owner
    API-->>RC: tool call: turn_off_alarm
    RC->>RC: check permissions for owner
    RC-->>VPE: "Alarm is off, Alice"

Enroll speakers by voice: "Jarvis, lerne meine Stimme, ich bin Alice"
Confidence threshold configurable (default 0.7)
Speaker identity feeds into the security model for per-session tool filtering

Security Model

Each session gets a security level based on speaker confidence and per-speaker caps. Tools are filtered cumulatively — higher levels include all lower-level permissions.

Level	Confidence	Example Permissions
guest	< 50%	Lights, music
family	50 – 70%	+ Climate, own calendar
trusted	70 – 90%	+ Document titles, all calendars
owner	> 90%	+ Full document access

Speaker max levels cap access regardless of confidence (e.g., children never exceed family).

Tool Routing

Function calls from the Realtime API are classified and handled by route:

flowchart TD
    TC["Tool Call from Realtime API"]
    TC --> Check{"Route?"}
    Check -- "ha_*, sonos_*" --> Direct["Direct → execute via HA / OpenClaw"]
    Check -- "request_reasoning" --> Reason["Reasoning → forward to background LLM"]
    Check -- "exec_*, file_delete_*" --> Danger["Dangerous → require approval"]
    Check -- "unmatched" --> Block["Blocked → reject with error"]

Configure routes with TOOL_ROUTE_DIRECT, TOOL_ROUTE_REASONING, and TOOL_ROUTE_DANGEROUS (comma-separated glob patterns).

OpenClaw Integration

OpenClaw is an open-source AI gateway that manages your assistant's personality (SOUL.md), user profiles, persistent memory, and provides tool skills for Home Assistant, Paperless, calendars, and more. RealtimeClaw connects to OpenClaw for context and tool execution — but works fine without it (use HA Direct tools + addon config for personality instead).

flowchart TD
    RC["RealtimeClaw"]
    OC["OpenClaw Gateway"]

    RC -- "POST /tools/invoke" --> OC
    RC -- "POST /v1/chat/completions" --> OC
    RC -- "WS /rpc (context)" --> OC

Context: SOUL.md (personality), IDENTITY.md, USER.md loaded per session
Tools: All OpenClaw skills (ha-ultimate, paperless, calendar) available
Reasoning: Deep reasoning queries forwarded to background LLM with web search
Device pairing: Ed25519 key exchange, auto-reconnect on restart

Additional Features

Audio pacing — response audio is paced at real-time rate (1024 B / 32 ms) to prevent ESP32 buffer overflow
Reconnect with backoff — exponential backoff with jitter, per-provider retry parameters
Latency tracking — TTFA (Time To First Audio) logged per session
Barge-in — interrupting mid-response cancels the current response immediately
Startup verification — checks all integrations (xAI, HA, OpenClaw, Eagle) at boot
Debug mode — set DEBUG_REALTIME_CLAW=true for verbose WebSocket logging

Configuration

Copy .env.example and set at minimum XAI_API_KEY:

cp .env.example .env

See .env.example for the full list of variables with descriptions.

For complex setups, use a JSON config file that overrides environment variables:

CONFIG_FILE=./config.json realtime-claw

See config.example.json for the schema.

Development

npm install          # install dependencies
npm run dev          # run with tsx (hot reload)
npm test             # run all 251 tests
npm run build        # compile TypeScript
npm run typecheck    # type-check without emitting
npm run lint         # lint src/ and tests/

Project Structure

src/
  bridge.ts              # session orchestration, barge-in, tool routing
  config.ts              # env + JSON config loading with validation
  types.ts               # shared type definitions
  wyoming/               # Wyoming Protocol TCP server + binary parser
  realtime/              # WebSocket client, providers, reconnect, latency
  security/              # security levels, permission filtering
  router/                # tool call classification (direct/reasoning/dangerous)
  tools/                 # OpenClaw client, HA direct, reasoning tool
  speaker/               # Eagle speaker ID, enrollment, voiceprint management
  session/               # session flush / memory persistence
tests/                   # vitest: unit + integration (251 tests, 13 E2E)
addon/                   # Home Assistant addon (config, Dockerfile, docs)

Contributing

See CONTRIBUTING.md for guidelines.

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github		.github
addon		addon
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
config.example.json		config.example.json
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json
repository.yaml		repository.yaml
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RealtimeClaw

Architecture

Direct Voice PE Connection

Prerequisites

Installation

Home Assistant Addon (recommended)

Docker

From Source

Connect to Home Assistant

Features

Multi-Provider Support

Speaker Identification

Security Model

Tool Routing

OpenClaw Integration

Additional Features

Configuration

Development

Project Structure

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RealtimeClaw

Architecture

Direct Voice PE Connection

Prerequisites

Installation

Home Assistant Addon (recommended)

Docker

From Source

Connect to Home Assistant

Features

Multi-Provider Support

Speaker Identification

Security Model

Tool Routing

OpenClaw Integration

Additional Features

Configuration

Development

Project Structure

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages