Beacon

A guiding light for everyday freedom.

Built for ElevenHacks 2026.

There are hundreds of millions of visually impaired people around the world.

Things we do without thinking aren't that simple for them.

Every day takes courage.

Beacon is a voice-first, hands-free mobile app built to help visually impaired users better understand and interact with the world around them. It runs on a phone mounted to a specialized lightweight chest harness and pairs with a Bluetooth remote that lives in the user's pocket to instantly start the assistant, turning the system into wearable AI. No screens, no menus, no touching the display. Users simply talk, and Beacon sees the world and talks back in real time.

Beacon delivers real-time scene narration, hazard detection, object recognition, and live OCR for signs, menus, books, and labels, all spoken aloud naturally. It also supports walking turn-by-turn navigation, live web search, location-aware weather, and smart-home control entirely through conversation. The entire experience is fully hands-free with natural speech, sub-second barge-in, and no keyboards, menus, or screen interaction.

Built with Cursor for rapid agentic development, ElevenLabs (Conversational AI Agents, Voice Design, Text-to-Speech, Sound Effects), Gemini as the vision intelligence layer, Firecrawl and SerpAPI for web search, Google Maps + OSRM for navigation, Open-Meteo for weather, and Tuya for smart-home control.

Beacon. A guiding light for everyday freedom.

Screenshots

The app

_{Landing — the one screen a sighted caretaker sees during install.}	_{Idle / Start — three voice example prompts then the FAB.}
_{Setup — one-time wizard for permissions, identity, smart-light pairing.}	_{Dashboard — saved sessions + live weather + GPS card.}

Real scenes

Phone clipped to a chest harness, voice in, voice out. Left half is the Beacon PWA; right half is the user.

_{Walking outside. "Can you look what's in front of me?" → scene narration in real time.}

_{At the bakery. "I want croissants. Do you have any?" → outlet-aware vision answer.}

_{At home. "Turn on the lights" → control_light via Tuya cloud. "Look for my bottle" → vision find-object.}

What is Beacon?

A hands-free PWA designed to be opened on a phone clipped to a chest harness. Once started, the user never touches the screen — the loop is voice-only:

See for me — describe what's in front of the camera. Read text on signs, packaging, screens. Refuses to hallucinate on dark or blank frames.
Search for me — three-tier dispatcher: SerpAPI Google engine for live data (scores, weather, breaking), Firecrawl news for "latest" / "this week", Firecrawl general for evergreen. Queries auto-rewrite to inject resolved city ("restaurants near me" → "restaurants in Chennai").
Review for me — stand in front of a restaurant, ask "is this place good?". place_reviews uses GPS + Google Places to pin the exact outlet, returns its star rating + address + real reviewer snippets — not generic chain reviews.
Walk me somewhere — turn-by-turn walking navigation. Geocodes "nearest X" / category queries via Google Places Text Search (Nominatim fallback), routes via OSRM (default, no key) or Google Maps Directions (MAPS_PROVIDER=gmaps). Live watchPosition (3 s throttle) auto-advances waypoints at 15 m via haversine; AR turn-arrow chip + leaflet route overlay (with × dismiss) update on every cross. Voice "stop" or the dismiss button clears the route end-to-end.
Tell me where I am — reverse-geocoded location read aloud, mini-map shown inline in the transcript.
Help me at home — voice control for a Tuya cloud-paired smart bulb. Reachable from any network including Railway production.

The PWA installs on iOS Safari and Android Chrome. The whole interaction model is one big tap-to-start, then voice for everything that follows. The camera feed is the canvas — the assistant orb is corner-pinned, not center-stage.

How It Works

1. Open + Start. User taps Enter on the landing screen, then Start. The browser asks for mic, camera, and geolocation. The PWA mounts an AudioWorklet recorder, opens a WebSocket to the backend, and the backend mints a fresh signed URL via GET /v1/convai/conversation/get-signed-url and connects the WS upstream to the (private) ElevenLabs ConvAI agent.

2. Talk. Microphone PCM @ 16 kHz streams over the WS. ConvAI's STT transcribes; the agent's LLM (Gemini Flash) plans; ElevenLabs TTS streams audio back; the worklet plays it. Barge-in fires interruption events that flush the audio buffer in <200 ms.

3. See. When the user asks a vision question, the agent emits client_tool_call('describe_scene'). The backend reads the latest buffered camera frame (the frontend streams JPEGs over the WebSocket on a low cadence into a per-session ring buffer) and hands it to Gemini 3.1 Flash Lite, which returns 1–2 sentences. Cold call ~2.3 s, warm ~1.2 s — Gemini is pre-warmed on lifespan startup with the bundled backend/fixtures/coffee.jpg. A native-multimodal POST /files + multimodal_message pipeline (Path A) is wired in backend/elevenlabs_client.py and verified via the /dev/vision-test endpoint, but kept off the hot path (race between agent-decided tool calls and our multimodal inject; hard 10-file cap per conversation).

5. Search. Agent calls web_search. Tiered dispatcher classifies intent:

Live (live | breaking | right now | scores | currently) → SerpAPI Google engine (answer_box, sports_results, knowledge_graph, organic enrichment) — actual numbers, not just headlines.
News (latest | recent | news | this week | update) → Firecrawl /v2/search with sources=news, tbs=qdr:w.
General → Firecrawl web tier, no time filter — keeps evergreen content like reviews and how-tos.

"near me / nearby / this location / this place" rewrite to the user's resolved city via Nominatim before dispatch. Top-3 results summarized into 800 chars; agent speaks 1–2 sentences.

6. Outlet-precise reviews. When the user asks reviews of a named place they're at ("reviews for this Popeyes"), agent skips web_search and calls place_reviews(name). Backend uses session GPS + Google Places findplacefromtext (50 m → 200 m → 500 m progressive radius) to lock on the specific outlet, then place/details to pull rating, address, and 2 real reviewer snippets. Returns the actual outlet — not generic chain reviews.

7. Tools render inline. Every tool result is forwarded to the frontend as a tool_result WebSocket envelope and rendered as a persistent card in the transcript — chips for control_light / time_now / describe_scene, a glass card for web_search with a "from web · in {city}" pill, a 120 px Leaflet mini-map for where_am_i. The top-left tool chip is the during indicator; the inline card is the after record.

8. Smart light (cloud). control_light dispatches to Tuya OpenAPI via tinytuya.Cloud, toggling a Havells F8 bulb on switch_led. HMAC-SHA256 signed REST, 5-second timeout. Works from any network — phone on cellular, backend on Railway, bulb behind home WiFi all interoperate.

Key Features

Feature	Description
Voice-first hands-free loop	One tap to start, then PCM audio streams both ways over a single WebSocket. No keyboard, no on-screen taps required.
Wireless remote integration	Pairs with any cheap Bluetooth shutter remote (the kind bundled with selfie sticks) via Web Bluetooth in the Setup flow. The OS sees the remote as a standard HID device. End-to-end zero-touch operation (intercepting the remote's `AudioVolumeUp` / `AudioVolumeDown` DOM events) is on the roadmap; in the current build, pairing serves as the on-ramp and the on-screen Start/Stop FAB stays pixel-stable so a sighted caretaker (or the user themselves) can rely on muscle memory.
Camera-first UI	Live video fills the phone frame as the canvas. Reticle, AR turn-arrow slot, and corner orb sit on top.
Vision via `describe_scene` Client tool	Agent emits `client_tool_call('describe_scene')` → backend hands the buffered frame to Gemini 3.1 Flash Lite (cold 2.3 s, warm 1.2 s, pre-warmed on lifespan startup). Covers scene description, OCR for signs / books / menus / labels, and object identification — no separate tools needed. Refuses to hallucinate on dark / blank frames — explicit refusal string instead of made-up shapes.
Tiered web search	Live tier (`scores`, `breaking`, `right now`) → SerpAPI Google engine with answer_box / sports_results / knowledge_graph parsing. News tier → Firecrawl `/v2/search` `sources=news` past-week. General → Firecrawl web. Auto-rewrites "near me / this location / this place" to resolved city via Nominatim.
Outlet-precise reviews (`place_reviews`)	GPS + Google Places `findplacefromtext` finds the exact outlet the user is standing at, then `place/details` returns its star rating, address, and 2 real reviewer snippets. Beats generic chain reviews from web search. Progressive radius 50 → 200 → 500 m.
Inline tool transcript	Every tool call leaves a persistent visual record: chips for time/light/scene, glass card for search, Leaflet mini-map for location.
Smart light control (cloud)	`control_light(action: on
Signed-URL private agent	Agent locked `enable_auth=true`. Backend mints a fresh signed URL per connect. API key never reaches the browser.
PWA + offline	Service worker caches shell. AudioWorklet processors served from `public/`. Installs on iOS/Android home screen.
Setup wizard	Name + mic/cam/geo permission test + bulb cloud reachability check. Persisted to `localStorage` under `beacon.<domain>.v1` keys.
Session history	Every session saved client-side (`beacon.sessions.v1`) with transcript, frames count, GPS track. Dashboard shows past sessions.
Camera-first dashboard	Live session timer or "{N} sessions saved" hero; no telemetry leakage; sessions list grouped by day.
Walking navigation	OSRM + Google Maps Directions abstraction (`MAPS_PROVIDER` env). `navigate_to` / `next_turn` / `stop_navigation` tools, live `watchPosition` (3 s throttle), 15 m waypoint auto-advance via haversine. Floating leaflet route overlay + AR turn arrow chip. Forward geocode falls back from Google Places Text Search → Nominatim so category queries ("nearest Domino's", "closest fish market") resolve.
Where am I	`where_am_i` reverse-geocodes session GPS via Nominatim (zoom 18) → "street, neighborhood, city, country" line. Disambiguated from `describe_scene` in the system prompt so location queries no longer fall back to the camera.
UI sound feedback (ElevenLabs Sound Effects)	4 short cues baked once via the ElevenLabs Sound Effects API (`POST /v1/sound-generation`) by `backend/generate_sfx.py` and served from `frontend/public/sfx/`: `tool` plays on every agent tool dispatch, `nav` on bottom-tab navigation, `start` / `stop` on the session FAB. Cached automatically by the service worker; played from `lib/sfx.ts` with iOS audio-unlock warm on first user gesture.
Spoken welcome line (ElevenLabs TTS)	`backend/bake_welcome.py` bakes a static welcome line via ElevenLabs Text-to-Speech to `frontend/public/welcome.mp3`. Voice + wording configurable via `ELEVENLABS_WELCOME_VOICE_ID` / `ELEVENLABS_WELCOME_TEXT` / `ELEVENLABS_WELCOME_MODEL`.
Idle-screen quick-start prompts	The pre-session screen surfaces 3 voice example prompts ("What do you see?", "Where am I?", "Take me to the nearest coffee shop.") so first-time users know what to say without reading docs.
Smooth barge-in transcript	Mid-utterance interruption is handled end-to-end: backend forwards `agent_response_correction` with `replace_last: true`; frontend overwrites the last assistant bubble in place. No duplicate text after a "stop". Audio buffer also flushes in <200 ms via the `interruption` event.

Wireless remote integration (zero-touch on-ramp)

Beacon is built to run from a chest-mounted phone with as little screen interaction as possible. The cheapest way to get there: a generic Bluetooth selfie-stick shutter remote (~$3 on any marketplace).

The remote pairs through Web Bluetooth in the Setup flow (frontend/app/setup/page.tsx) — Beacon stores a paired flag + the device name in localStorage under beacon.remote.v1. Pairing is the user-facing confirmation that the chest-harness input device is connected.
The remote shows up to Android as a standard HID device at the OS level — no driver, no app permissions, no SDK.
Roadmap (not yet wired): intercepting the remote's AudioVolumeUp / AudioVolumeDown DOM events to map one button to "enter app" and the other to "start / stop the session." Android Chrome does not reliably forward hardware volume keys to a backgrounded PWA, so this needs a real-device test pass before shipping.
The on-screen Start / Stop FAB is deliberately pixel-stable across states (bottom: 16, left: 50%, transform: translateX(-50%), 56×56) — critical for muscle memory on a blind-user workflow, and a fallback for situations where the remote is out of reach.

Realtime Architecture

┌─────────────────┐         audio + JSON envelopes        ┌─────────────────────────┐
│  Frontend (PWA) │◄─────────────────────────────────────►│ ElevenLabs ConvAI       │
│  mic + camera   │         WebSocket (signed URL)        │ STT + agent LLM         │
│  + GPS          │                                       │ (gemini-2.5-flash) + TTS│
└────────┬────────┘                                       └──────────┬──────────────┘
         │ JPEG frames + GPS pushed                                  │ client_tool_call
         │ over the same WebSocket                                   ▼
         ▼                                                  ┌────────────────────┐
   ┌──────────────┐                                         │  Tool dispatcher   │
   │ Backend      │                                         │  (FastAPI)         │
   │ frame buffer │                                         └────┬───────────────┘
   │ + GPS store  │                                              │
   └──────┬───────┘                                              │
          │                                                      ▼
          └──► describe_scene │ where_am_i │ get_weather │ web_search  │ place_reviews │ navigate_to        │ control_light
              (Gemini 3.1 FL) │ (Nominatim) │ (Open-Meteo)│ (SerpAPI +  │ (Google       │ (OSRM | gmaps)     │ (Tuya cloud)
                                                          │  Firecrawl) │  Places)      │ + next_turn        │ + time_now
                                                                                        │ + stop_navigation

Vision path. When the agent emits client_tool_call('describe_scene'), the backend pulls the latest frame from the per-session ring buffer, calls Gemini 3.1 Flash Lite (cold 2.3 s, warm 1.2 s, pre-warmed on lifespan startup with backend/fixtures/coffee.jpg), and returns the answer via client_tool_result. A Path A pipeline (ConvAI POST /v1/convai/conversations/{id}/files + multimodal_message) is implemented in backend/elevenlabs_client.py and verified via the /dev/vision-test endpoint, but kept off the hot path — the F2-hybrid spike (see plan.md:128) hit a race between the agent's own tool-call decision and our multimodal frame inject, and the hard 10-file cap per conversation makes it unsuitable for long sessions.

Latency tactics:

Pre-warm Gemini 3.1 Flash Lite on backend lifespan startup with the bundled backend/fixtures/coffee.jpg.
Transitional phrase ("let me look") in agent system prompt — first audio plays before the tool round-trip completes.
TTS streamed at 16 kHz to match the worklet player; no resample.
Signed URL fetched per connect (no long-lived public agent ID exposed).

ElevenLabs Integration

Product	Where Used
Conversational AI (`/v1/convai/conversation`)	The whole audio loop. STT + agent LLM (`gemini-2.5-flash`) + TTS, all integrated. WebSocket schema: `client_tool_call`, `client_tool_result`, `interruption`, `agent_response`, `user_transcript`.
Multimodal `POST /files` (implemented, reserved)	`backend/elevenlabs_client.py` has the full `POST /v1/convai/conversations/{id}/files` upload + `multimodal_message` inject path. Exercised by the `/dev/vision-test` endpoint. Not on the production hot path — production vision goes through the `describe_scene` Client tool because of the agent-vs-inject tool-decision race and the hard 10-file cap per conversation.
Signed-URL flow (`GET /v1/convai/conversation/get-signed-url`)	Backend mints a short-lived `wss://` URL per WebSocket connect. Agent locked private (`platform_settings.auth.enable_auth=true`). Client never sees agent ID or API key.
Console-provisioned tools	Ten Client tools live on the agent. Eight are registered via API by `backend/register_tools.py` (`web_search`, `get_weather`, `control_light`, `where_am_i`, `navigate_to`, `next_turn`, `stop_navigation`, `place_reviews`); `time_now` and `describe_scene` are dashboard-created during initial agent setup (see `AGENT_SETUP_INSTRUCTIONS.md`). Other backend ops scripts: `fix_tool_timeouts.py`, `fix_describe_scene_param.py`, `fix_system_prompt.py`, `fix_web_search_desc.py`. System-prompt source-of-truth lives between `<!-- prompt:start/end -->` markers in `AGENT_SETUP_INSTRUCTIONS.md` and is patched onto the agent via API.
Voice Design (`POST /v1/text-to-voice/design` → `POST /v1/text-to-voice`)	Beacon's guide voice — a calm, caring archetype crafted for hands-free use, not a stock library voice. Set as the agent's default in the ConvAI dashboard.
Sound Effects (`POST /v1/sound-generation`)	UI feedback cues — 4 short MP3s (`tool`, `nav`, `start`, `stop`) baked offline via `backend/generate_sfx.py`, served from `frontend/public/sfx/`, cached by the service worker, played from `lib/sfx.ts`.
Text-to-Speech (`POST /v1/text-to-speech/{voice_id}`)	Static welcome audio baked once by `backend/bake_welcome.py` to `frontend/public/welcome.mp3`. Voice + text configurable via `ELEVENLABS_WELCOME_VOICE_ID` / `ELEVENLABS_WELCOME_TEXT`.

Audio format pinned to pcm @ 16000 in the dashboard (default pcm_44100 mismatches the AudioWorklet player and produces chipmunk audio). Tool response_timeout_secs patched from the dashboard default of 1 s up to 5–20 s depending on tool.

Tools

The agent has ten tools registered in the ElevenLabs console (all Client tools, dispatched via client_tool_call to the FastAPI backend):

Tool	Params	What it does	Dispatched in
`time_now`	—	Returns local ISO time + day. Frontend renders a 🕒 chip with parsed human time.	`backend/main.py` `_handle_tool_call`
`describe_scene`	`question?: string`	Reads the buffered camera frame, calls Gemini 3.1 Flash Lite. Returns 1–2 spoken sentences describing what's visible. Hard-refuses on dark / blank frames with an explicit string instead of hallucinating. Also covers OCR ("read this") and object identification — no separate tools needed.	`backend/vision.py`
`where_am_i`	—	Reverse-geocodes session GPS via Nominatim (zoom 18) → "street, neighborhood, city, country". Disambiguated from `describe_scene` in the system prompt so the agent never falls back to the camera for location queries. City is also pre-warmed in the background on first GPS fix for the session, so the answer arrives in <5 ms after the user asks.	`backend/main.py`
`get_weather`	—	Current weather at the user's GPS via Open-Meteo (free, no key, native Celsius). Pre-warmed on first GPS fix and cached for 10 min, so repeat calls return in <10 ms; cold path ~150 ms. Returns `Current weather in <city>: <X>°C, <conditions>, humidity <Y>%, wind <Z> km/h`. Replaces SerpAPI for weather — Open-Meteo takes raw lat/lon so the answer always matches the user's actual location (no Google geo-IP guessing).	`backend/main.py` + `backend/weather.py`
`web_search`	`query: string`	Three-tier dispatcher. Live (regex match on `live\|breaking\|right now\|score\|currently\|happening now`) → SerpAPI Google engine, parses `answer_box` / `sports_results` / `knowledge_graph` / organic enrichment. News (`latest\|recent\|news\|update`) → Firecrawl `/v2/search` `sources=news`, `tbs=qdr:w`. General → Firecrawl web tier. Rewrites "near me / this location / this place" to resolved city via Nominatim before dispatch. Top-3 summarized into 800 chars. Weather keywords short-circuit to `get_weather` before SerpAPI (safety net for agent confusion).	`backend/main.py`
`place_reviews`	`name: string`	Outlet-precise reviews of a place the user is at. Pulls session GPS, calls Google Places `findplacefromtext` (50 m → 200 m → 500 m progressive radius) to lock the specific outlet, then `place/details` for star rating, address, and up to 2 reviewer snippets. Falls back gracefully when GPS missing or no candidate.	`backend/main.py`
`control_light`	`action: "on"\|"off"`	Toggles a Havells F8 bulb via Tuya OpenAPI (`tinytuya.Cloud`), `switch_led` command. HMAC-SHA256 signed REST, 5-second timeout. Reachable from anywhere — backend on Railway controls a bulb behind home WiFi. Returns `"light is on"` / `"light is off"`.	`backend/smart_bulb.py`
`navigate_to`	`destination: string`	Walking nav. Geocodes via Google Places Text Search (handles "nearest X" / category queries) → Nominatim fallback. Plans route via OSRM (default) or Google Directions (`MAPS_PROVIDER=gmaps`). Stores `nav_session[session_id]`, returns ETA + first turn cue. Frontend renders the polyline + AR arrow.	`backend/navigation.py` + `backend/main.py`
`next_turn`	—	Bumps `current_idx` server-side, returns the upcoming step instruction. Used when the user asks "what's next".	`backend/main.py`
`stop_navigation`	—	Clears `nav_session[session_id]`, fires `nav_end` envelope to frontend. Polyline + AR arrow disappear. Also reachable via the × button on the route overlay (frontend sends `stop_nav` directly).	`backend/main.py`

System prompt rules + per-tool docs live between  markers in AGENT_SETUP_INSTRUCTIONS.md. Push edits with uv run python fix_system_prompt.py from backend/.

Tech Stack

Category	Technology
Frontend framework	Next.js 16 App Router, React 19, TypeScript strict
Styling	Tailwind CSS 4, custom design tokens (Syne / DM Sans / Geist Mono), glassmorphism
Realtime audio	AudioWorklet (PCM 16 kHz both ways), `pcm-recorder-processor.js`, `pcm-player-processor.js`
Maps	Leaflet (no react-leaflet wrapper) — `MapPinPreview` (read-only mini), `GpsMapClient` (interactive dashboard)
PWA	Service worker (`public/sw.js`), manifest with `any` + `maskable` icons at 192/512, Apple touch icon, iOS standalone display
Backend framework	FastAPI on Python 3.12, Uvicorn
Agents / Voice	ElevenLabs Conversational AI (`elevenlabs` Python SDK server-side, signed-URL flow), `client_tool_call` / `client_tool_result` schema, `gemini-2.5-flash` as the agent LLM
Vision	Google Gemini `gemini-3.1-flash-lite` via the `describe_scene` Client tool (`backend/vision.py`). The native ConvAI multimodal `POST /files` + `multimodal_message` pipeline is implemented in `backend/elevenlabs_client.py` but reserved for the `/dev/vision-test` endpoint only.
Search	SerpAPI Google engine for live tier (`/search.json`); Firecrawl `/v2/search` for news + general; raw `httpx`, no SDK
Weather	Open-Meteo `/v1/forecast` (free, no key, native Celsius) — `backend/weather.py` mirrors `frontend/lib/weather.ts`
Places / Reviews	Google Places API (`findplacefromtext` + `details`) — outlet pinning + reviews
Geocoding	OpenStreetMap Nominatim (reverse-geocode "near me" → city); IP-fallback when GPS denied
Smart home	`tinytuya.Cloud` — Tuya OpenAPI (HMAC-SHA256 signed REST), `switch_led` command — Havells F8 bulb. Works from any network.
Routing	OSRM walking profile (default, no key) + Google Maps Directions (`MAPS_PROVIDER=gmaps`); 15 m haversine waypoint advance, polyline decode for Google's encoded format
Sound effects	ElevenLabs Sound Effects API (`POST /v1/sound-generation`); baked offline by `backend/generate_sfx.py` into `frontend/public/sfx/{tool,nav,start,stop}.mp3`
Static welcome audio	ElevenLabs TTS (`POST /v1/text-to-speech/{voice_id}`); baked offline by `backend/bake_welcome.py` into `frontend/public/welcome.mp3`
Package management	`uv` for Python (lock + dev), `pip` + exported `requirements.txt` for the production Docker image; `pnpm` for frontend
Validation	Pydantic at FastAPI boundaries, Zod planned at Next.js API routes
Deployment	Railway (backend Dockerfile, repo-root build context, `$PORT` shell-expanded), Vercel (frontend, `NEXT_PUBLIC_WS_URL=wss://<railway>/ws`)

Running Locally

Prerequisites: Node 20+, pnpm, Python 3.12, uv. API keys for ElevenLabs, Firecrawl, Gemini.

# Clone
git clone https://github.com/<you>/Beacon.git && cd Beacon

# Backend deps (uv)
cd backend && uv sync && cd ..

# Frontend deps
cd frontend && pnpm install && cd ..

# Configure env (do NOT commit)
cp backend/.env.example backend/.env
cp frontend/.env.example frontend/.env.local
# fill in keys — see "Environment Variables" below

# Boot backend :8000 + frontend :3000 + tunnels (when present)
./dry-run.sh
# Or run them separately:
#   cd backend  && uv run uvicorn main:app --reload --port 8000
#   cd frontend && pnpm dev

Open http://localhost:3000 on a phone (use a tunnel — Cloudflare, ngrok — since getUserMedia requires HTTPS on real devices).

Environment Variables

Templates live per-app: backend/.env.example and frontend/.env.example.

Backend (backend/.env):

# ElevenLabs (server-only — never inlined in client components)
ELEVENLABS_API_KEY=...
ELEVENLABS_AGENT_ID=...        # console-provisioned ConvAI agent (private, signed-URL only)
ELEVENLABS_REGION=default      # default for hobby/pro; us|eu|in only on Enterprise data-residency workspaces

# Vision Path B
GEMINI_API_KEY=...

# Web search
FIRECRAWL_API_KEY=...           # news + general (/v2/search)
SERPAPI_KEY=...                 # live tier only — scores/breaking/right-now (100 credits/mo)

# Smart light (Tuya cloud — works from any network)
TUYA_CLIENT_ID=...
TUYA_CLIENT_SECRET=...
TUYA_REGION=in                  # us | eu | cn | in
HAVELLS_DEVICE_ID=...

# Maps (optional — OSRM is the default and needs no key)
GOOGLE_MAPS_API_KEY=...
MAPS_PROVIDER=osrm              # osrm | gmaps

Frontend (frontend/.env.local):

NEXT_PUBLIC_WS_URL=ws://localhost:8000/ws

Setting up the ConvAI agent

The agent is provisioned manually in the ElevenLabs console — full step-by-step runbook with system prompt, tool schemas, voice/LLM config, and the enable_auth=true switch lives in AGENT_SETUP_INSTRUCTIONS.md.

Quick reference once the agent exists:

# Probe which region is hosting your agent
uv run python backend/probe_region.py

# Inspect the live agent's tool list, prompts, timeouts (run BEFORE guessing tool failures)
uv run python backend/inspect_agent.py

# Push the latest system prompt + turn timeouts to the agent
uv run python backend/fix_system_prompt.py

# Idempotently register all 8 client tools (web_search, get_weather, control_light, where_am_i,
# navigate_to, next_turn, stop_navigation, place_reviews) on the live agent
uv run python backend/register_tools.py

# Patch every tool's response_timeout_secs (dashboard default is 1 s — too short for Gemini)
uv run python backend/fix_tool_timeouts.py

# Lock the agent private (enable_auth=true) so only signed URLs can connect
uv run python backend/lock_agent.py

# Bake the static welcome line (ElevenLabs TTS) → frontend/public/welcome.mp3
uv run python backend/bake_welcome.py

# Bake the 4 UI SFX cues (ElevenLabs Sound Effects) → frontend/public/sfx/{tool,nav,start,stop}.mp3
uv run python backend/generate_sfx.py

Deployment

Backend → Railway

Push the repo to GitHub.
Railway → New Project → Deploy from GitHub repo.
Railway auto-detects railway.json. Builder = Dockerfile, build context = repo root, Dockerfile path = backend/Dockerfile. The Dockerfile uses pip install -r requirements.txt (the requirements.txt is exported from uv.lock — uv export --no-dev --no-hashes).
Set Variables on Railway:
- Required: ELEVENLABS_API_KEY, ELEVENLABS_AGENT_ID, ELEVENLABS_REGION, GEMINI_API_KEY.
- Search: FIRECRAWL_API_KEY (news + general), SERPAPI_KEY (live tier — scores, breaking, right-now; 100 credits/mo).
- Places: GOOGLE_MAPS_API_KEY (Places API enabled in Google Cloud Console; findplacefromtext + place/details are the consumed endpoints).
- Smart light (cloud): TUYA_CLIENT_ID, TUYA_CLIENT_SECRET, TUYA_REGION (us\|eu\|cn\|in), HAVELLS_DEVICE_ID. Cloud Tuya works from Railway — bulb does not need to share a network with the backend.
Networking → Generate Domain. Verify https://<your-domain>/healthz returns {"ok": true, "agent_id_set": true, ...}.

Frontend → Vercel

Vercel → Add New → Project → import the repo.
Root Directory: frontend/ (critical — set in the import screen).
Framework auto-detects Next.js.
Environment Variable: NEXT_PUBLIC_WS_URL=wss://<your-railway-domain>/ws (note wss://, trailing /ws, no slash after).
Deploy. Vercel issues a *.vercel.app URL.

Open the Vercel URL on a phone, grant mic + camera + location, tap Enter → Start. Frontend opens a WS to wss://<railway>/ws; backend mints a signed URL and connects to the private ConvAI agent. Railway logs should show [ElevenLabs] WS open (signed URL) followed by your first transcript.

The smart-light demo works directly from Railway production — Tuya cloud is reachable from anywhere, no LAN constraint.

License

Licensed under the MIT License.

Built by Limb for ElevenHacks 2026.

⚠️ Use with caution. Beacon is an AI-powered assistant — outputs may be incorrect, delayed, or incomplete. Always cross-check with a cane, guide, or sighted assistance before acting. Production deployment as a certified assistive device would require accessibility audits, regulatory clearance, and formal safety validation beyond the scope of this build.

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
backend		backend
frontend		frontend
images		images
.dockerignore		.dockerignore
.gitignore		.gitignore
AGENT_SETUP_INSTRUCTIONS.md		AGENT_SETUP_INSTRUCTIONS.md
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
dry-run.sh		dry-run.sh
railway.json		railway.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Beacon

Screenshots

The app

Real scenes

Table of Contents

What is Beacon?

How It Works

Key Features

Wireless remote integration (zero-touch on-ramp)

Realtime Architecture

ElevenLabs Integration

Tools

Tech Stack

Running Locally

Environment Variables

Setting up the ConvAI agent

Deployment

License

About

Uh oh!

Releases 2

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Beacon

Screenshots

The app

Real scenes

Table of Contents

What is Beacon?

How It Works

Key Features

Wireless remote integration (zero-touch on-ramp)

Realtime Architecture

ElevenLabs Integration

Tools

Tech Stack

Running Locally

Environment Variables

Setting up the ConvAI agent

Deployment

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Contributors

Uh oh!

Languages