WEBWAIFU 3

Browser-based VRM companion with local/cloud AI, voice, memory, and real-time 3D

Quick Start | Features | V2 vs V3 | Provider Setup | Architecture

What It Is

WEBWAIFU 3 is a complete rewrite of WEBWAIFU V2. Same concept — a browser-based AI companion with a 3D avatar — but rebuilt from scratch with a proper framework, typed codebase, and a more focused feature set.

V2 was vanilla JS with no build system, supported both VRM and Live2D, used Edge TTS, and ran on Netlify. V3 drops the cruft, picks better defaults, and ships as a real SvelteKit app.

Primary routes:

/ main companion UI
/manager provider config, memory controls, voice management, and data tools

What Changed from V2

	V2	V3
Framework	Vanilla JS, no build	SvelteKit 2 + Vite 7 + TypeScript
Avatar	VRM + Live2D (Pixi.js)	VRM only — deeper Three.js integration, post-processing, animation sequencer
TTS	Edge TTS (free) + Fish Audio	Kokoro (local, runs on WebGPU/WASM) + Fish Audio (realtime PCM streaming)
LLM	Gemini, OpenAI, OpenRouter, Ollama	OpenAI, OpenRouter, Ollama, LM Studio — all via Vercel AI SDK Responses API
STT	Whisper tiny	Whisper tiny with silence trimming + transcript sanitization
Memory	Embeddings + summarization	Same core but proper Web Worker isolation, hybrid mode, configurable summarization LLM
Lip sync	Phoneme (Edge TTS) + amplitude (Fish)	Approximate phoneme mapping + PCM amplitude analysis (both providers)
Deploy	Netlify serverless	Vercel (adapter-vercel)
State	localStorage + IndexedDB	Svelte 5 runes + IndexedDB (StorageManager singleton)
Persistence	Partial	Full — every setting, conversation, VRM binary, voice list persisted

Dropped: Live2D, Gemini, Edge TTS, DistilBERT, Pixi.js, Netlify functions. Added: Kokoro local TTS, LM Studio, realtime Fish PCM streaming, post-processing pipeline, animation sequencer, character system, TTS formatting rules auto-injection, semantic memory with vector search.

Feature Surface

AI chat

Providers: ollama, lmstudio, openai, openrouter
Streaming token output wired into TTS sentence accumulator
Per-request Ollama tuning: num_ctx, flash_attn, kv_cache_type
Character-based system prompts with user nickname support
Auto-injected TTS formatting rules when voice is enabled (no emojis, spoken prose, proper punctuation)

Text-to-speech

Kokoro: local TTS via Web Worker, runs on WebGPU with WASM fallback, configurable device + precision (fp32/fp16/q8/q4/q4f16)
Fish Audio: cloud TTS with realtime PCM streaming over WebSocket, configurable latency mode
Sentence accumulator splits LLM output into natural TTS chunks
Fish voice model operations from manager UI: list, search, create, delete

Speech-to-text

Whisper model: Xenova/whisper-tiny.en in a Web Worker
Silence trimming before transcription to reduce hallucinations
Transcript sanitization (filters repeated-char artifacts)
Push-to-talk with optional auto-send and mic permission pre-check

Semantic memory

Embeddings model: Xenova/all-MiniLM-L6-v2 (384-dim) in a Web Worker
Modes: auto-prune, auto-summarize, hybrid (default)
Cosine similarity search injects relevant history into prompt context
Optional summarization LLM with separate provider/model/key configuration
Model can be loaded/unloaded on demand to free GPU memory

3D avatar and rendering

VRM load from built-in asset or user upload (binary persisted in IndexedDB)
Animation playlist/sequencer with crossfade controls
Realistic material toggle (PBR path)
Post-processing: bloom, chromatic aberration, film grain, glitch, FXAA/SMAA/TAA, bleach bypass, color correction, outline
Adjustable key/fill/rim/hemi/ambient lighting
Lip sync driven from both HTMLAudioElement (Kokoro) and PCM AudioBufferSourceNode (Fish) playback paths

Persistence and management

All settings saved in IndexedDB via StorageManager singleton
Provider defaults, visual settings, active tab, conversation state, Fish voice lists all persisted
Conversation auto-save on every user + assistant message
Conversation export (JSON, TXT)
Data tools in manager: export all, import, clear history, factory reset
Custom VRM binary persisted in IndexedDB

Quick Start

Requirements

Node.js (current LTS recommended)
npm
Modern browser with WebGL + WebAudio support
WebGPU recommended for Kokoro TTS (falls back to WASM automatically)
At least one chat backend:
- Local (Ollama or LM Studio)
- Cloud (OpenAI or OpenRouter)

Install and run

npm install
npm run dev

Dev URL: https://localhost:5173 Note: HTTPS in development is provided by @vitejs/plugin-basic-ssl.

Provider Setup

Ollama

Install Ollama and pull a model (example: ollama pull llama3.2).
Enable "Allow through network" in Ollama settings.
Set CORS origins so the browser can access Ollama.

Mac/Linux:

OLLAMA_ORIGINS=* ollama serve

Windows:

Add system environment variable OLLAMA_ORIGINS=*.
Restart Ollama.

LM Studio

Download a model.
Start local server (default http://localhost:1234).
Enable CORS in LM Studio server settings.

OpenAI / OpenRouter

Open /manager.
Add API key.
Select provider and model defaults.

Fish Audio

Add Fish API key in /manager.
Fish requests are proxied through server routes:
- POST /api/tts/fish (single request)
- POST /api/tts/fish-stream (realtime WebSocket streaming, PCM)

Model and Runtime Notes

On first use, browser-side model downloads may occur and be cached:

Model	Size	Purpose	Runtime
Kokoro 82M ONNX	~86 MB	Local TTS	WebGPU / WASM
Whisper tiny.en	~40 MB	Local STT	Web Worker
MiniLM-L6-v2	~23 MB	Embeddings / memory	Web Worker

Models are loaded on demand — Whisper and embeddings only init when you use them. Kokoro inits automatically when TTS is enabled with the Kokoro provider.

Security

Keys are stored in browser IndexedDB only
Keys are sent only to selected providers and required proxy endpoints
API key inputs use CSS text-security masking to prevent browser password manager interference
Fish TTS requires API key transit through your deployed SvelteKit server route
Use scoped keys and provider spending limits for production

Scripts

npm run dev       # Dev server with HTTPS
npm run build     # Production build
npm run preview   # Preview production build
npm run check     # Svelte type checking

Architecture

Frontend: SvelteKit 2, Svelte 5 runes, TypeScript
3D: three, @pixiv/three-vrm
LLM: Vercel AI SDK (ai, @ai-sdk/openai) — Responses API
STT/Memory models: @huggingface/transformers in Web Workers
TTS: kokoro-js (local WebGPU/WASM), fish-audio (cloud WebSocket)
Persistence: IndexedDB via src/lib/storage/index.ts
Analytics: Vercel Web Analytics

Deployment

Current project config uses @sveltejs/adapter-vercel (svelte.config.js).

If you deploy to a different target, switch adapters and ensure the Fish API routes (src/routes/api/tts/) are deployed server-side.

Live: webwaifu3.vercel.app

License

This repository currently does not include a LICENSE file. Add one before public distribution.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.claude/commands		.claude/commands
.code-review		.code-review
.cursor/skills/code-review-pr		.cursor/skills/code-review-pr
data		data
src		src
static		static
.gitignore		.gitignore
.npmrc		.npmrc
.vercelignore		.vercelignore
AGENTS.md		AGENTS.md
README.md		README.md
REVIEW_FINDINGS_FINAL_PASS.md		REVIEW_FINDINGS_FINAL_PASS.md
netlify.toml		netlify.toml
package-lock.json		package-lock.json
package.json		package.json
svelte.config.js		svelte.config.js
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WEBWAIFU 3

Browser-based VRM companion with local/cloud AI, voice, memory, and real-time 3D

What It Is

What Changed from V2

Feature Surface

AI chat

Text-to-speech

Speech-to-text

Semantic memory

3D avatar and rendering

Persistence and management

Quick Start

Requirements

Install and run

Provider Setup

Ollama

LM Studio

OpenAI / OpenRouter

Fish Audio

Model and Runtime Notes

Security

Scripts

Architecture

Deployment

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WEBWAIFU 3

Browser-based VRM companion with local/cloud AI, voice, memory, and real-time 3D

What It Is

What Changed from V2

Feature Surface

AI chat

Text-to-speech

Speech-to-text

Semantic memory

3D avatar and rendering

Persistence and management

Quick Start

Requirements

Install and run

Provider Setup

Ollama

LM Studio

OpenAI / OpenRouter

Fish Audio

Model and Runtime Notes

Security

Scripts

Architecture

Deployment

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages