Skip to content

bigbraingoldfish/lullabeast

Lullabeast

License: MIT CI Runs on OpenClaw Status Python Platform

Lullabeast is an open-source, local-capable, autonomous development pipeline. Describe what you want to build in plain English, and your team of agents (planner, executor, reviewer) implements it phase by phase against a real git repository, with deterministic gate scripts checking every step and an escalation path back to you when they get stuck.

Lullabeast runs on OpenClaw and requires it: Lullabeast is the pipeline and dashboard, while the agents themselves run inside OpenClaw's runtime environment, which you install and run separately.

One full Lullabeast run, from idea to finished app

Early release, and honest about it. Lullabeast reliably builds small, single-purpose webapps end to end, and hard phases escalate to you by design, but larger or more complex projects tend to surface more issues and need more polish before they're done. This beta release is a single-user tool meant to run on a trusted machine, protected by a locally generated access token. (I personally run the pipeline and OpenClaw in a VM for transparency.) I'm shipping now to get it in front of other builders and find out where it breaks so I can strengthen it. Bug reports and suggestions are welcome.


How it works

Lullabeast workflow: idea to PRD to roadmap, then the gated planner/executor/reviewer build loop with escalation, to a finished app

Queue several projects and Lullabeast works them in order, honoring dependencies between them.

The agents. Four pipeline agents and two ideation agents, run by a single orchestrator state machine that owns the git operations, blame attribution, and recovery logic:

  • Planner: turns the current roadmap phase into a concrete implementation plan.
  • Executor: writes the code and tests, then commits to a phase branch.
  • Reviewer: verifies the result actually behaves as intended, including screenshot-based visual review for UI phases.
  • Gate scripts: deterministic, LLM-free Python checkers between every handoff: file manifest, git diff, test results, behavioral evidence, unaccounted deletions. The gates are the pipeline's source of truth; no agent advances on its own say-so.
  • Escalation: invoked only when gates and retries are exhausted; notifies you and pauses.
  • prd-creator / roadmap-converter: drive the idea, PRD, roadmap front end.

Examples

Lullabeast works best for small, focused webapps. Each one below was built end to end by the pipeline.

Flagship: SVG Pictionary

SVG Pictionary, a multi-screen AI drawing-and-guessing game with multiple AI players and a human, round-based

Multiple AI players and a human in a round-based draw-and-guess game over real SVG: multi-screen routing, persistent per-round state, and live, simultaneous LLM API calls that render elements both the models and the player act on. The hardest target on this list, an application rather than a widget.

GridBeast 2048 Regex Tester
GridBeast 2048 Regex Tester
Conway (classic) Conway (conquest)
Conway classic mode Conway conquest mode

Every example links the exact PRD and phased roadmap that drove its build:

Project PRD Roadmap What it is
SVG Pictionary PRD Roadmap Flagship: multi-screen, persistent state, live simultaneous LLM API calls
GridBeast PRD Roadmap Mini spreadsheet; formula engine with precedence, ranges, cycle detection
Regex Tester PRD Roadmap Live matcher; inline flags, in-place highlighting, light/dark
2048 PRD Roadmap Tile-merge game; correct merge semantics, score/best, spawn-on-move
Multi-team Conway PRD Roadmap Two rule systems (classic + conquest) over one grid engine

GridBeast's self-test panel was generated in a follow-up pass to surface engine correctness for users, after the formula engine itself was built and manually verified.

Built with. No closed frontier models anywhere in the loop, just local and open-weight cloud:

Project Planner Executor Reviewer
Multi-team Conway llamacpp/Qwen3.6-27B-MTP llamacpp/Qwen3.6-27B-MTP llamacpp/Qwen3.6-27B
Regex Tester llamacpp/Qwen3.6-27B-MTP llamacpp/Qwen3.6-27B-MTP llamacpp/Qwen3.6-27B
GridBeast llamacpp/Qwen3.6-27B llamacpp/Qwen3.6-27B llamacpp/Qwen3.6-27B
2048 openrouter/z-ai/glm-5.2 openrouter/moonshotai/kimi-k2.7-code openrouter/moonshotai/kimi-k2.7-code
SVG Pictionary openrouter/z-ai/glm-5.2 openrouter/moonshotai/kimi-k2.7-code openrouter/moonshotai/kimi-k2.7-code

Run it your way

Lullabeast is model-agnostic. OpenClaw owns all model configuration, so you choose the cost/quality trade-off:

Mode What runs the agents Trade-off
Budget cloud (best results so far) Open-weight multi-modal models via your OpenRouter key (e.g. MiniMax, GLM, Kimi, Qwen) Cheap per token; your key, your provider
Fully local Validated on a single RTX 4090 (48GB, modded) with unsloth/Qwen3.6-27B-MTP-GGUF (q8_0) No cloud in the loop; front-end (UI) phases are the weak spot, with the most failures and retries
Hybrid Local for escalation + executor (where most of the work, and the cost savings, happen), cloud for planner and/or reviewer (cheap to build a strong foundation and review thoroughly) Often the best cost/quality balance; still being tuned

Model notes. A multi-modal model is required for the executor and reviewer (the reviewer does screenshot-based visual review for UI phases) and recommended for the planner. Use the strongest model you're comfortable running for the roadmap-converter: it's isolated by design, so your most expensive model is spent only on conversion. We also suggest keeping the idea-to-PRD chat (prd-creator) on a cloud model, where it produces noticeably better drafts.


Quick start

Requirements

Read this before running anything. The first item is a separate install:

  • A running OpenClaw gateway. Install it first (install guide) and have it listening on its default port, localhost:18789. Requires OpenClaw v2026.5.18 or newer.
  • Linux, macOS, or WSL2. Native Windows is unsupported (the pipeline uses POSIX fcntl locking).
  • Python 3.11+ and git with a configured identity (user.name / user.email). The pipeline commits to your repos, and install.sh checks this.
  • Node.js 22+ with npm. Builds the signals plugin and the Playwright visual-review MCP, which is required for UI phases (install.sh adds it by default; --skip-playwright to opt out).

Running on non-default ports, or hitting setup snags? SETUP.md covers configuration, version notes, and silent-failure modes in full.

Install & run

# 1. Install and start OpenClaw first.
#    https://docs.openclaw.ai/start/getting-started
curl -s http://localhost:18789/v1/models   # should respond; "connection refused" = gateway not up

# 2. Install Lullabeast.
git clone https://github.com/bigbraingoldfish/lullabeast.git autodev-ui
cd autodev-ui
./install.sh            # interactive; registers agents with OpenClaw, generates your dashboard access token; safe to re-run

# 3. Run the dashboard from the repo root; the -m module form is required.
source .env
python -m ui.server

Launch command: run python -m ui.server from the repo root (it binds 127.0.0.1 on the configured port, default 18790). The script form python ui/server.py fails with ModuleNotFoundError: No module named 'ui'; use the module form above, or the equivalent uvicorn ui.server:app --host 127.0.0.1 --port 18790 for CLI control of host/port.

The server prints your access URL at startup; open it (http://127.0.0.1:18790/?token=<AUTODEV_UI_TOKEN>). That authorizes your browser via a cookie (30 days); scripts can send the same token as a Bearer header instead. Then verify the webhook wiring once (use POST; a GET check can miss token mismatches):

curl -sS -o /dev/null -w "HTTP %{http_code}\n" -X POST http://127.0.0.1:18789/hooks/agent \
  -H "Authorization: Bearer <hooks.token>" -H "Content-Type: application/json" \
  -d '{"agentId":"prd-creator","sessionKey":"ideas:install-check:0","wakeMode":"now","message":"ping"}'

HTTP 200 means you're wired up; 401 means the Bearer token doesn't match hooks.token in openclaw.json. The full walkthrough, including macOS LaunchAgent and Linux/WSL2 systemd units, is in SETUP.md.


The dashboard

Pipeline Monitor, the live planner, executor, reviewer loop mid-run
The Pipeline Monitor mid-run: live planner, executor, reviewer loop, per-phase metrics, activity feed.

  • Project Ideas: chat an idea into a PRD, then generate the roadmap + verification contract.
  • Setup & Preflight: point at a project repo, run preflight checks, launch the pipeline.
  • Pipeline Monitor: watch the live planner, executor, reviewer loop, per-phase metrics, and a real-time activity feed; recover from git errors or answer escalations.
  • Queue: line up multiple projects with dependency ordering; Lullabeast runs them sequentially.
  • Cost & token visibility: per-phase and per-agent cost/token breakdowns, live during a run and recallable after, in both the Monitor and the Queue (shown when your models report usage).

Security

  • The dashboard and /api/* require an access token (AUTODEV_UI_TOKEN, generated by install.sh). Open the tokenized URL printed at startup to authorize your browser; scripts send the token as a Bearer header. This is single-user, local-tool auth: one shared token, no accounts, roles, or audit trail.
  • Stay on loopback anyway. Bind to 127.0.0.1 (the default); the server refuses non-loopback requests unless a token is configured. Never expose the raw port to the internet; anything beyond a trusted LAN belongs behind a reverse proxy + TLS. See SECURITY.md and SETUP.md: Security and network exposure.
  • The pipeline executes agent-written code on the host under your user account. Treat Lullabeast as operator tooling for a trusted machine, not a multi-tenant service.
  • Secrets (the dashboard token AUTODEV_UI_TOKEN and the webhook Bearer token AUTODEV_HOOKS_TOKEN) live in .env (gitignored). Never commit them in ui/config.json or any tracked file.

Troubleshooting

Symptom Likely cause Fix
UI says RUNNING but no agents ever fire OpenClaw gateway is down curl -s http://localhost:18789/v1/models; connection refused means start the gateway
Webhook returns 401 hooks.tokenAUTODEV_HOOKS_TOKEN Sync the Bearer secret (install.sh step 8 does this)
Dashboard or /api/* returns 401 browser not authorized / wrong AUTODEV_UI_TOKEN Open the tokenized URL printed at server startup
orchestrator.py not found on launch .env not sourced source .env before starting uvicorn
Every UI/INT phase fails at the reviewer Playwright MCP not installed Re-run ./install.sh without --skip-playwright
Header shows Queue stalled all queued projects blocked / in dependency hold Clear a parent or resume a banked escalation answer

A deeper "Silent failure modes" walkthrough lives in SETUP.md.


Project layout

autodev/
  pipeline/         # orchestrator, sentinel poller, gate scripts, skill manager
  skill-library/    # per-discipline, per-role SKILL.md injected per phase
  agents/           # agent identity docs deployed into OpenClaw workspaces
  plugin/           # autodev-pipeline-signals OpenClaw plugin (TS to esbuild bundle)
  config/           # skill mapping, MCP + session setup
  docs/             # PIPELINE-SPEC, PIPELINE-CONSTRAINTS, assumptions
ui/                 # FastAPI server + single-file React dashboard (no build step)
tests/              # UI server tests
install.sh          # interactive installer

Pipeline state (lock, queue, event log, ideas) lives in <repo>/.autodev/; OpenClaw's own config and agent workspaces live under ~/.openclaw. ui/server.py (all API routes) and autodev/pipeline/orchestrator.py (the whole state machine) are intentionally single-file to keep control flow auditable; read CLAUDE.md before refactoring either. The full spec is autodev/docs/PIPELINE-SPEC.md.


Documentation

Doc What it covers
SETUP.md Full install, openclaw.json requirements, silent-failure modes, cost metrics
GLOSSARY.md Dashboard terminology (pipeline/queue states, skills, metrics)
CLAUDE.md Complete contributor orientation and architecture deep-dive
CONTRIBUTING.md Dev setup, PR conventions, adding skills
SECURITY.md Security model and vulnerability reporting
CODE_OF_CONDUCT.md Community expectations for participation (Contributor Covenant)
autodev/docs/PIPELINE-SPEC.md The architecture spec / single source of truth

License

MIT © 2026 Lullabeast contributors.

About

Bring an idea. Leave with a working MVP. Lullabeast is a gated multi-agent pipeline that gets results from cheap open models. Run it with cloud, local, or mix and match. You choose!

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Contributors