A guiding light for everyday freedom.
Built for ElevenHacks 2026.
There are hundreds of millions of visually impaired people around the world.
Things we do without thinking aren't that simple for them.
Every day takes courage.
Beacon is a voice-first, hands-free mobile app built to help visually impaired users better understand and interact with the world around them. It runs on a phone mounted to a specialized lightweight chest harness and pairs with a Bluetooth remote that lives in the user's pocket to instantly start the assistant, turning the system into wearable AI. No screens, no menus, no touching the display. Users simply talk, and Beacon sees the world and talks back in real time.
Beacon delivers real-time scene narration, hazard detection, object recognition, and live OCR for signs, menus, books, and labels, all spoken aloud naturally. It also supports walking turn-by-turn navigation, live web search, location-aware weather, and smart-home control entirely through conversation. The entire experience is fully hands-free with natural speech, sub-second barge-in, and no keyboards, menus, or screen interaction.
Built with Cursor for rapid agentic development, ElevenLabs (Conversational AI Agents, Voice Design, Text-to-Speech, Sound Effects), Gemini as the vision intelligence layer, Firecrawl and SerpAPI for web search, Google Maps + OSRM for navigation, Open-Meteo for weather, and Tuya for smart-home control.
Beacon. A guiding light for everyday freedom.
Phone clipped to a chest harness, voice in, voice out. Left half is the Beacon PWA; right half is the user.
Walking outside. "Can you look what's in front of me?" → scene narration in real time.
At the bakery. "I want croissants. Do you have any?" → outlet-aware vision answer.
At home. "Turn on the lights" → control_light via Tuya cloud. "Look for my bottle" → vision find-object.
- What is Beacon?
- How It Works
- Key Features
- Screenshots
- Realtime Architecture
- ElevenLabs Integration
- Tools
- Tech Stack
- Running Locally
- Deployment
- License
A hands-free PWA designed to be opened on a phone clipped to a chest harness. Once started, the user never touches the screen — the loop is voice-only:
- See for me — describe what's in front of the camera. Read text on signs, packaging, screens. Refuses to hallucinate on dark or blank frames.
- Search for me — three-tier dispatcher: SerpAPI Google engine for live data (scores, weather, breaking), Firecrawl news for "latest" / "this week", Firecrawl general for evergreen. Queries auto-rewrite to inject resolved city ("restaurants near me" → "restaurants in Chennai").
- Review for me — stand in front of a restaurant, ask "is this place good?".
place_reviewsuses GPS + Google Places to pin the exact outlet, returns its star rating + address + real reviewer snippets — not generic chain reviews. - Walk me somewhere — turn-by-turn walking navigation. Geocodes "nearest X" / category queries via Google Places Text Search (Nominatim fallback), routes via OSRM (default, no key) or Google Maps Directions (
MAPS_PROVIDER=gmaps). LivewatchPosition(3 s throttle) auto-advances waypoints at 15 m via haversine; AR turn-arrow chip + leaflet route overlay (with×dismiss) update on every cross. Voice "stop" or the dismiss button clears the route end-to-end. - Tell me where I am — reverse-geocoded location read aloud, mini-map shown inline in the transcript.
- Help me at home — voice control for a Tuya cloud-paired smart bulb. Reachable from any network including Railway production.
The PWA installs on iOS Safari and Android Chrome. The whole interaction model is one big tap-to-start, then voice for everything that follows. The camera feed is the canvas — the assistant orb is corner-pinned, not center-stage.
1. Open + Start. User taps Enter on the landing screen, then Start. The browser asks for mic, camera, and geolocation. The PWA mounts an AudioWorklet recorder, opens a WebSocket to the backend, and the backend mints a fresh signed URL via GET /v1/convai/conversation/get-signed-url and connects the WS upstream to the (private) ElevenLabs ConvAI agent.
2. Talk. Microphone PCM @ 16 kHz streams over the WS. ConvAI's STT transcribes; the agent's LLM (Gemini Flash) plans; ElevenLabs TTS streams audio back; the worklet plays it. Barge-in fires interruption events that flush the audio buffer in <200 ms.
3. See. When the user asks a vision question, the agent emits client_tool_call('describe_scene'). The backend reads the latest buffered camera frame (the frontend streams JPEGs over the WebSocket on a low cadence into a per-session ring buffer) and hands it to Gemini 3.1 Flash Lite, which returns 1–2 sentences. Cold call ~2.3 s, warm ~1.2 s — Gemini is pre-warmed on lifespan startup with the bundled backend/fixtures/coffee.jpg. A native-multimodal POST /files + multimodal_message pipeline (Path A) is wired in backend/elevenlabs_client.py and verified via the /dev/vision-test endpoint, but kept off the hot path (race between agent-decided tool calls and our multimodal inject; hard 10-file cap per conversation).
5. Search. Agent calls web_search. Tiered dispatcher classifies intent:
- Live (
live | breaking | right now | scores | currently) → SerpAPI Google engine (answer_box,sports_results,knowledge_graph, organic enrichment) — actual numbers, not just headlines. - News (
latest | recent | news | this week | update) → Firecrawl/v2/searchwithsources=news,tbs=qdr:w. - General → Firecrawl web tier, no time filter — keeps evergreen content like reviews and how-tos.
"near me / nearby / this location / this place" rewrite to the user's resolved city via Nominatim before dispatch. Top-3 results summarized into 800 chars; agent speaks 1–2 sentences.
6. Outlet-precise reviews. When the user asks reviews of a named place they're at ("reviews for this Popeyes"), agent skips web_search and calls place_reviews(name). Backend uses session GPS + Google Places findplacefromtext (50 m → 200 m → 500 m progressive radius) to lock on the specific outlet, then place/details to pull rating, address, and 2 real reviewer snippets. Returns the actual outlet — not generic chain reviews.
7. Tools render inline. Every tool result is forwarded to the frontend as a tool_result WebSocket envelope and rendered as a persistent card in the transcript — chips for control_light / time_now / describe_scene, a glass card for web_search with a "from web · in {city}" pill, a 120 px Leaflet mini-map for where_am_i. The top-left tool chip is the during indicator; the inline card is the after record.
8. Smart light (cloud). control_light dispatches to Tuya OpenAPI via tinytuya.Cloud, toggling a Havells F8 bulb on switch_led. HMAC-SHA256 signed REST, 5-second timeout. Works from any network — phone on cellular, backend on Railway, bulb behind home WiFi all interoperate.
| Feature | Description |
|---|---|
| Voice-first hands-free loop | One tap to start, then PCM audio streams both ways over a single WebSocket. No keyboard, no on-screen taps required. |
| Wireless remote integration | Pairs with any cheap Bluetooth shutter remote (the kind bundled with selfie sticks) via Web Bluetooth in the Setup flow. The OS sees the remote as a standard HID device. End-to-end zero-touch operation (intercepting the remote's AudioVolumeUp / AudioVolumeDown DOM events) is on the roadmap; in the current build, pairing serves as the on-ramp and the on-screen Start/Stop FAB stays pixel-stable so a sighted caretaker (or the user themselves) can rely on muscle memory. |
| Camera-first UI | Live video fills the phone frame as the canvas. Reticle, AR turn-arrow slot, and corner orb sit on top. |
Vision via describe_scene Client tool |
Agent emits client_tool_call('describe_scene') → backend hands the buffered frame to Gemini 3.1 Flash Lite (cold 2.3 s, warm 1.2 s, pre-warmed on lifespan startup). Covers scene description, OCR for signs / books / menus / labels, and object identification — no separate tools needed. Refuses to hallucinate on dark / blank frames — explicit refusal string instead of made-up shapes. |
| Tiered web search | Live tier (scores, breaking, right now) → SerpAPI Google engine with answer_box / sports_results / knowledge_graph parsing. News tier → Firecrawl /v2/search sources=news past-week. General → Firecrawl web. Auto-rewrites "near me / this location / this place" to resolved city via Nominatim. |
Outlet-precise reviews (place_reviews) |
GPS + Google Places findplacefromtext finds the exact outlet the user is standing at, then place/details returns its star rating, address, and 2 real reviewer snippets. Beats generic chain reviews from web search. Progressive radius 50 → 200 → 500 m. |
| Inline tool transcript | Every tool call leaves a persistent visual record: chips for time/light/scene, glass card for search, Leaflet mini-map for location. |
| Smart light control (cloud) | `control_light(action: on |
| Signed-URL private agent | Agent locked enable_auth=true. Backend mints a fresh signed URL per connect. API key never reaches the browser. |
| PWA + offline | Service worker caches shell. AudioWorklet processors served from public/. Installs on iOS/Android home screen. |
| Setup wizard | Name + mic/cam/geo permission test + bulb cloud reachability check. Persisted to localStorage under beacon.<domain>.v1 keys. |
| Session history | Every session saved client-side (beacon.sessions.v1) with transcript, frames count, GPS track. Dashboard shows past sessions. |
| Camera-first dashboard | Live session timer or "{N} sessions saved" hero; no telemetry leakage; sessions list grouped by day. |
| Walking navigation | OSRM + Google Maps Directions abstraction (MAPS_PROVIDER env). navigate_to / next_turn / stop_navigation tools, live watchPosition (3 s throttle), 15 m waypoint auto-advance via haversine. Floating leaflet route overlay + AR turn arrow chip. Forward geocode falls back from Google Places Text Search → Nominatim so category queries ("nearest Domino's", "closest fish market") resolve. |
| Where am I | where_am_i reverse-geocodes session GPS via Nominatim (zoom 18) → "street, neighborhood, city, country" line. Disambiguated from describe_scene in the system prompt so location queries no longer fall back to the camera. |
| UI sound feedback (ElevenLabs Sound Effects) | 4 short cues baked once via the ElevenLabs Sound Effects API (POST /v1/sound-generation) by backend/generate_sfx.py and served from frontend/public/sfx/: tool plays on every agent tool dispatch, nav on bottom-tab navigation, start / stop on the session FAB. Cached automatically by the service worker; played from lib/sfx.ts with iOS audio-unlock warm on first user gesture. |
| Spoken welcome line (ElevenLabs TTS) | backend/bake_welcome.py bakes a static welcome line via ElevenLabs Text-to-Speech to frontend/public/welcome.mp3. Voice + wording configurable via ELEVENLABS_WELCOME_VOICE_ID / ELEVENLABS_WELCOME_TEXT / ELEVENLABS_WELCOME_MODEL. |
| Idle-screen quick-start prompts | The pre-session screen surfaces 3 voice example prompts ("What do you see?", "Where am I?", "Take me to the nearest coffee shop.") so first-time users know what to say without reading docs. |
| Smooth barge-in transcript | Mid-utterance interruption is handled end-to-end: backend forwards agent_response_correction with replace_last: true; frontend overwrites the last assistant bubble in place. No duplicate text after a "stop". Audio buffer also flushes in <200 ms via the interruption event. |
Beacon is built to run from a chest-mounted phone with as little screen interaction as possible. The cheapest way to get there: a generic Bluetooth selfie-stick shutter remote (~$3 on any marketplace).
- The remote pairs through Web Bluetooth in the Setup flow (
frontend/app/setup/page.tsx) — Beacon stores apairedflag + the device name inlocalStorageunderbeacon.remote.v1. Pairing is the user-facing confirmation that the chest-harness input device is connected. - The remote shows up to Android as a standard HID device at the OS level — no driver, no app permissions, no SDK.
- Roadmap (not yet wired): intercepting the remote's
AudioVolumeUp/AudioVolumeDownDOM events to map one button to "enter app" and the other to "start / stop the session." Android Chrome does not reliably forward hardware volume keys to a backgrounded PWA, so this needs a real-device test pass before shipping. - The on-screen
Start / StopFAB is deliberately pixel-stable across states (bottom: 16, left: 50%, transform: translateX(-50%), 56×56) — critical for muscle memory on a blind-user workflow, and a fallback for situations where the remote is out of reach.
┌─────────────────┐ audio + JSON envelopes ┌─────────────────────────┐
│ Frontend (PWA) │◄─────────────────────────────────────►│ ElevenLabs ConvAI │
│ mic + camera │ WebSocket (signed URL) │ STT + agent LLM │
│ + GPS │ │ (gemini-2.5-flash) + TTS│
└────────┬────────┘ └──────────┬──────────────┘
│ JPEG frames + GPS pushed │ client_tool_call
│ over the same WebSocket ▼
▼ ┌────────────────────┐
┌──────────────┐ │ Tool dispatcher │
│ Backend │ │ (FastAPI) │
│ frame buffer │ └────┬───────────────┘
│ + GPS store │ │
└──────┬───────┘ │
│ ▼
└──► describe_scene │ where_am_i │ get_weather │ web_search │ place_reviews │ navigate_to │ control_light
(Gemini 3.1 FL) │ (Nominatim) │ (Open-Meteo)│ (SerpAPI + │ (Google │ (OSRM | gmaps) │ (Tuya cloud)
│ Firecrawl) │ Places) │ + next_turn │ + time_now
│ + stop_navigation
Vision path. When the agent emits client_tool_call('describe_scene'), the backend pulls the latest frame from the per-session ring buffer, calls Gemini 3.1 Flash Lite (cold 2.3 s, warm 1.2 s, pre-warmed on lifespan startup with backend/fixtures/coffee.jpg), and returns the answer via client_tool_result. A Path A pipeline (ConvAI POST /v1/convai/conversations/{id}/files + multimodal_message) is implemented in backend/elevenlabs_client.py and verified via the /dev/vision-test endpoint, but kept off the hot path — the F2-hybrid spike (see plan.md:128) hit a race between the agent's own tool-call decision and our multimodal frame inject, and the hard 10-file cap per conversation makes it unsuitable for long sessions.
Latency tactics:
- Pre-warm Gemini 3.1 Flash Lite on backend
lifespanstartup with the bundledbackend/fixtures/coffee.jpg. - Transitional phrase ("let me look") in agent system prompt — first audio plays before the tool round-trip completes.
- TTS streamed at 16 kHz to match the worklet player; no resample.
- Signed URL fetched per connect (no long-lived public agent ID exposed).
| Product | Where Used |
|---|---|
Conversational AI (/v1/convai/conversation) |
The whole audio loop. STT + agent LLM (gemini-2.5-flash) + TTS, all integrated. WebSocket schema: client_tool_call, client_tool_result, interruption, agent_response, user_transcript. |
Multimodal POST /files (implemented, reserved) |
backend/elevenlabs_client.py has the full POST /v1/convai/conversations/{id}/files upload + multimodal_message inject path. Exercised by the /dev/vision-test endpoint. Not on the production hot path — production vision goes through the describe_scene Client tool because of the agent-vs-inject tool-decision race and the hard 10-file cap per conversation. |
Signed-URL flow (GET /v1/convai/conversation/get-signed-url) |
Backend mints a short-lived wss:// URL per WebSocket connect. Agent locked private (platform_settings.auth.enable_auth=true). Client never sees agent ID or API key. |
| Console-provisioned tools | Ten Client tools live on the agent. Eight are registered via API by backend/register_tools.py (web_search, get_weather, control_light, where_am_i, navigate_to, next_turn, stop_navigation, place_reviews); time_now and describe_scene are dashboard-created during initial agent setup (see AGENT_SETUP_INSTRUCTIONS.md). Other backend ops scripts: fix_tool_timeouts.py, fix_describe_scene_param.py, fix_system_prompt.py, fix_web_search_desc.py. System-prompt source-of-truth lives between <!-- prompt:start/end --> markers in AGENT_SETUP_INSTRUCTIONS.md and is patched onto the agent via API. |
Voice Design (POST /v1/text-to-voice/design → POST /v1/text-to-voice) |
Beacon's guide voice — a calm, caring archetype crafted for hands-free use, not a stock library voice. Set as the agent's default in the ConvAI dashboard. |
Sound Effects (POST /v1/sound-generation) |
UI feedback cues — 4 short MP3s (tool, nav, start, stop) baked offline via backend/generate_sfx.py, served from frontend/public/sfx/, cached by the service worker, played from lib/sfx.ts. |
Text-to-Speech (POST /v1/text-to-speech/{voice_id}) |
Static welcome audio baked once by backend/bake_welcome.py to frontend/public/welcome.mp3. Voice + text configurable via ELEVENLABS_WELCOME_VOICE_ID / ELEVENLABS_WELCOME_TEXT. |
Audio format pinned to pcm @ 16000 in the dashboard (default pcm_44100 mismatches the AudioWorklet player and produces chipmunk audio). Tool response_timeout_secs patched from the dashboard default of 1 s up to 5–20 s depending on tool.
The agent has ten tools registered in the ElevenLabs console (all Client tools, dispatched via client_tool_call to the FastAPI backend):
| Tool | Params | What it does | Dispatched in |
|---|---|---|---|
time_now |
— | Returns local ISO time + day. Frontend renders a 🕒 chip with parsed human time. | backend/main.py _handle_tool_call |
describe_scene |
question?: string |
Reads the buffered camera frame, calls Gemini 3.1 Flash Lite. Returns 1–2 spoken sentences describing what's visible. Hard-refuses on dark / blank frames with an explicit string instead of hallucinating. Also covers OCR ("read this") and object identification — no separate tools needed. | backend/vision.py |
where_am_i |
— | Reverse-geocodes session GPS via Nominatim (zoom 18) → "street, neighborhood, city, country". Disambiguated from describe_scene in the system prompt so the agent never falls back to the camera for location queries. City is also pre-warmed in the background on first GPS fix for the session, so the answer arrives in <5 ms after the user asks. |
backend/main.py |
get_weather |
— | Current weather at the user's GPS via Open-Meteo (free, no key, native Celsius). Pre-warmed on first GPS fix and cached for 10 min, so repeat calls return in <10 ms; cold path ~150 ms. Returns Current weather in <city>: <X>°C, <conditions>, humidity <Y>%, wind <Z> km/h. Replaces SerpAPI for weather — Open-Meteo takes raw lat/lon so the answer always matches the user's actual location (no Google geo-IP guessing). |
backend/main.py + backend/weather.py |
web_search |
query: string |
Three-tier dispatcher. Live (regex match on live|breaking|right now|score|currently|happening now) → SerpAPI Google engine, parses answer_box / sports_results / knowledge_graph / organic enrichment. News (latest|recent|news|update) → Firecrawl /v2/search sources=news, tbs=qdr:w. General → Firecrawl web tier. Rewrites "near me / this location / this place" to resolved city via Nominatim before dispatch. Top-3 summarized into 800 chars. Weather keywords short-circuit to get_weather before SerpAPI (safety net for agent confusion). |
backend/main.py |
place_reviews |
name: string |
Outlet-precise reviews of a place the user is at. Pulls session GPS, calls Google Places findplacefromtext (50 m → 200 m → 500 m progressive radius) to lock the specific outlet, then place/details for star rating, address, and up to 2 reviewer snippets. Falls back gracefully when GPS missing or no candidate. |
backend/main.py |
control_light |
action: "on"|"off" |
Toggles a Havells F8 bulb via Tuya OpenAPI (tinytuya.Cloud), switch_led command. HMAC-SHA256 signed REST, 5-second timeout. Reachable from anywhere — backend on Railway controls a bulb behind home WiFi. Returns "light is on" / "light is off". |
backend/smart_bulb.py |
navigate_to |
destination: string |
Walking nav. Geocodes via Google Places Text Search (handles "nearest X" / category queries) → Nominatim fallback. Plans route via OSRM (default) or Google Directions (MAPS_PROVIDER=gmaps). Stores nav_session[session_id], returns ETA + first turn cue. Frontend renders the polyline + AR arrow. |
backend/navigation.py + backend/main.py |
next_turn |
— | Bumps current_idx server-side, returns the upcoming step instruction. Used when the user asks "what's next". |
backend/main.py |
stop_navigation |
— | Clears nav_session[session_id], fires nav_end envelope to frontend. Polyline + AR arrow disappear. Also reachable via the × button on the route overlay (frontend sends stop_nav directly). |
backend/main.py |
System prompt rules + per-tool docs live between <!-- prompt:start/end --> markers in AGENT_SETUP_INSTRUCTIONS.md. Push edits with uv run python fix_system_prompt.py from backend/.
| Category | Technology |
|---|---|
| Frontend framework | Next.js 16 App Router, React 19, TypeScript strict |
| Styling | Tailwind CSS 4, custom design tokens (Syne / DM Sans / Geist Mono), glassmorphism |
| Realtime audio | AudioWorklet (PCM 16 kHz both ways), pcm-recorder-processor.js, pcm-player-processor.js |
| Maps | Leaflet (no react-leaflet wrapper) — MapPinPreview (read-only mini), GpsMapClient (interactive dashboard) |
| PWA | Service worker (public/sw.js), manifest with any + maskable icons at 192/512, Apple touch icon, iOS standalone display |
| Backend framework | FastAPI on Python 3.12, Uvicorn |
| Agents / Voice | ElevenLabs Conversational AI (elevenlabs Python SDK server-side, signed-URL flow), client_tool_call / client_tool_result schema, gemini-2.5-flash as the agent LLM |
| Vision | Google Gemini gemini-3.1-flash-lite via the describe_scene Client tool (backend/vision.py). The native ConvAI multimodal POST /files + multimodal_message pipeline is implemented in backend/elevenlabs_client.py but reserved for the /dev/vision-test endpoint only. |
| Search | SerpAPI Google engine for live tier (/search.json); Firecrawl /v2/search for news + general; raw httpx, no SDK |
| Weather | Open-Meteo /v1/forecast (free, no key, native Celsius) — backend/weather.py mirrors frontend/lib/weather.ts |
| Places / Reviews | Google Places API (findplacefromtext + details) — outlet pinning + reviews |
| Geocoding | OpenStreetMap Nominatim (reverse-geocode "near me" → city); IP-fallback when GPS denied |
| Smart home | tinytuya.Cloud — Tuya OpenAPI (HMAC-SHA256 signed REST), switch_led command — Havells F8 bulb. Works from any network. |
| Routing | OSRM walking profile (default, no key) + Google Maps Directions (MAPS_PROVIDER=gmaps); 15 m haversine waypoint advance, polyline decode for Google's encoded format |
| Sound effects | ElevenLabs Sound Effects API (POST /v1/sound-generation); baked offline by backend/generate_sfx.py into frontend/public/sfx/{tool,nav,start,stop}.mp3 |
| Static welcome audio | ElevenLabs TTS (POST /v1/text-to-speech/{voice_id}); baked offline by backend/bake_welcome.py into frontend/public/welcome.mp3 |
| Package management | uv for Python (lock + dev), pip + exported requirements.txt for the production Docker image; pnpm for frontend |
| Validation | Pydantic at FastAPI boundaries, Zod planned at Next.js API routes |
| Deployment | Railway (backend Dockerfile, repo-root build context, $PORT shell-expanded), Vercel (frontend, NEXT_PUBLIC_WS_URL=wss://<railway>/ws) |
Prerequisites: Node 20+, pnpm, Python 3.12, uv. API keys for ElevenLabs, Firecrawl, Gemini.
# Clone
git clone https://github.com/<you>/Beacon.git && cd Beacon
# Backend deps (uv)
cd backend && uv sync && cd ..
# Frontend deps
cd frontend && pnpm install && cd ..
# Configure env (do NOT commit)
cp backend/.env.example backend/.env
cp frontend/.env.example frontend/.env.local
# fill in keys — see "Environment Variables" below
# Boot backend :8000 + frontend :3000 + tunnels (when present)
./dry-run.sh
# Or run them separately:
# cd backend && uv run uvicorn main:app --reload --port 8000
# cd frontend && pnpm devOpen http://localhost:3000 on a phone (use a tunnel — Cloudflare, ngrok — since getUserMedia requires HTTPS on real devices).
Templates live per-app: backend/.env.example and frontend/.env.example.
Backend (backend/.env):
# ElevenLabs (server-only — never inlined in client components)
ELEVENLABS_API_KEY=...
ELEVENLABS_AGENT_ID=... # console-provisioned ConvAI agent (private, signed-URL only)
ELEVENLABS_REGION=default # default for hobby/pro; us|eu|in only on Enterprise data-residency workspaces
# Vision Path B
GEMINI_API_KEY=...
# Web search
FIRECRAWL_API_KEY=... # news + general (/v2/search)
SERPAPI_KEY=... # live tier only — scores/breaking/right-now (100 credits/mo)
# Smart light (Tuya cloud — works from any network)
TUYA_CLIENT_ID=...
TUYA_CLIENT_SECRET=...
TUYA_REGION=in # us | eu | cn | in
HAVELLS_DEVICE_ID=...
# Maps (optional — OSRM is the default and needs no key)
GOOGLE_MAPS_API_KEY=...
MAPS_PROVIDER=osrm # osrm | gmapsFrontend (frontend/.env.local):
NEXT_PUBLIC_WS_URL=ws://localhost:8000/wsThe agent is provisioned manually in the ElevenLabs console — full step-by-step runbook with system prompt, tool schemas, voice/LLM config, and the enable_auth=true switch lives in AGENT_SETUP_INSTRUCTIONS.md.
Quick reference once the agent exists:
# Probe which region is hosting your agent
uv run python backend/probe_region.py
# Inspect the live agent's tool list, prompts, timeouts (run BEFORE guessing tool failures)
uv run python backend/inspect_agent.py
# Push the latest system prompt + turn timeouts to the agent
uv run python backend/fix_system_prompt.py
# Idempotently register all 8 client tools (web_search, get_weather, control_light, where_am_i,
# navigate_to, next_turn, stop_navigation, place_reviews) on the live agent
uv run python backend/register_tools.py
# Patch every tool's response_timeout_secs (dashboard default is 1 s — too short for Gemini)
uv run python backend/fix_tool_timeouts.py
# Lock the agent private (enable_auth=true) so only signed URLs can connect
uv run python backend/lock_agent.py
# Bake the static welcome line (ElevenLabs TTS) → frontend/public/welcome.mp3
uv run python backend/bake_welcome.py
# Bake the 4 UI SFX cues (ElevenLabs Sound Effects) → frontend/public/sfx/{tool,nav,start,stop}.mp3
uv run python backend/generate_sfx.pyBackend → Railway
- Push the repo to GitHub.
- Railway → New Project → Deploy from GitHub repo.
- Railway auto-detects
railway.json. Builder = Dockerfile, build context = repo root, Dockerfile path =backend/Dockerfile. The Dockerfile usespip install -r requirements.txt(therequirements.txtis exported fromuv.lock—uv export --no-dev --no-hashes). - Set Variables on Railway:
- Required:
ELEVENLABS_API_KEY,ELEVENLABS_AGENT_ID,ELEVENLABS_REGION,GEMINI_API_KEY. - Search:
FIRECRAWL_API_KEY(news + general),SERPAPI_KEY(live tier — scores, breaking, right-now; 100 credits/mo). - Places:
GOOGLE_MAPS_API_KEY(Places API enabled in Google Cloud Console;findplacefromtext+place/detailsare the consumed endpoints). - Smart light (cloud):
TUYA_CLIENT_ID,TUYA_CLIENT_SECRET,TUYA_REGION(us\|eu\|cn\|in),HAVELLS_DEVICE_ID. Cloud Tuya works from Railway — bulb does not need to share a network with the backend.
- Required:
- Networking → Generate Domain. Verify
https://<your-domain>/healthzreturns{"ok": true, "agent_id_set": true, ...}.
Frontend → Vercel
- Vercel → Add New → Project → import the repo.
- Root Directory:
frontend/(critical — set in the import screen). - Framework auto-detects Next.js.
- Environment Variable:
NEXT_PUBLIC_WS_URL=wss://<your-railway-domain>/ws(notewss://, trailing/ws, no slash after). - Deploy. Vercel issues a
*.vercel.appURL.
Open the Vercel URL on a phone, grant mic + camera + location, tap Enter → Start. Frontend opens a WS to wss://<railway>/ws; backend mints a signed URL and connects to the private ConvAI agent. Railway logs should show [ElevenLabs] WS open (signed URL) followed by your first transcript.
The smart-light demo works directly from Railway production — Tuya cloud is reachable from anywhere, no LAN constraint.
Licensed under the MIT License.
Built by Limb for ElevenHacks 2026.
⚠️ Use with caution. Beacon is an AI-powered assistant — outputs may be incorrect, delayed, or incomplete. Always cross-check with a cane, guide, or sighted assistance before acting. Production deployment as a certified assistive device would require accessibility audits, regulatory clearance, and formal safety validation beyond the scope of this build.