Microwave AI (Alpha)

 ________________
|.-----------.   |
||   _____   |ooo|
||  |     |  |ooo|
||  |     |  | = |
||  '-----'  | _ |
||___________|[_]|
'----------------'

Microwave AI (Alpha)

Decentralized AI inference – BitTorrent for AI.

Anyone can run a node. Nodes contribute compute as experts. A router picks the best experts for each prompt and queries them in parallel. The network grows from a couple of friends on a LAN to a global mesh of volunteer GPUs.

Quickstart

Requirements: Python 3.10+ and Git.

macOS / Linux:

curl -fsSL https://raw.githubusercontent.com/robot-time/Microwave/main/install.sh | sh

Windows (PowerShell):

irm https://raw.githubusercontent.com/robot-time/Microwave/main/install.ps1 | iex

That's it. One command. It clones the repo, creates a venv, installs Ollama + a model, and connects your machine to the network as an expert node.

With options:

curl -fsSL https://raw.githubusercontent.com/robot-time/Microwave/main/install.sh | MICROWAVE_EXPERT_DOMAINS=code,math sh

$env:MICROWAVE_EXPERT_DOMAINS = "code,math"; irm https://raw.githubusercontent.com/robot-time/Microwave/main/install.ps1 | iex

After the first run, use the microwave command from anywhere:

microwave run                          # start your expert node
microwave run --expert-domains code    # start as a code expert
microwave gateway                      # start the gateway server
microwave status                       # check network health + experts
microwave version                      # print version

Expert domains

Each node can specialize. Set domains before running setup:

MICROWAVE_EXPERT_DOMAINS="code,math" bash setup.sh

Available domains: general, code, math, creative, science, reasoning. Defaults to general if unset.

How it works

User  ──►  Gateway (router)  ──►  Expert 1  ──►  ┐
                              ──►  Expert 2  ──►  ├── Aggregate ──► Stream back
                              ──►  Expert K  ──►  ┘

Microwave uses a Mixture of Experts (MoE) architecture:

Prompt arrives at the gateway.
Router classifies the prompt by domain (code, math, creative, etc.) and scores every online expert by relevance + latency + compute capacity.
Top-K experts receive the prompt in parallel — latency = slowest single expert, not the sum of all.
Aggregation returns the fastest response (or highest-confidence, or blended).

Each node runs a complete model locally via Ollama and connects over a reverse WebSocket — no open ports, no firewall changes.

What	Where
Dashboard	`http://GATEWAY:8000/`
Chat UI	`http://GATEWAY:8000/chat-ui`
Expert list	`http://GATEWAY:8000/experts`
Route preview	`POST /experts/route`
API	`POST /chat` (see below)

curl -N http://GATEWAY:8000/chat \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Write a python sort function", "model": "llama3.2", "strategy": "fastest"}'

The strategy field controls aggregation: fastest (default), confidence, or blend.

Roadmap

Phase	Goal	Status
0	2 machines on a LAN serving a model	done
1	5–10 LAN nodes, health checks, load balancing	done
2	WAN support — reverse WebSocket from anywhere	done
3	Mixture of Experts — parallel dispatch, domain routing	done
4	Latency-optimized networking — EWMA tracking, geo-aware	done
5	Model marketplace — nodes advertise capabilities	planned
6	Reputation system, incentives	planned

Architecture deep dive

Why decentralize?

Most AI lives behind a few companies' APIs. Microwave asks: what if models ran on a network of volunteer machines instead?

Mixture of Experts (MoE)

Every node is an expert. The gateway's router selects which experts to activate for each prompt:

device
  ↓
small local router (on gateway, near-zero latency)
  ↓
distributed experts (parallel)

Scoring formula per expert:

score = 0.35 × domain_relevance + 0.45 × speed_score + 0.20 × capacity_score

Domain relevance — keyword classifier maps prompts to domains, matched against each expert's declared specialties.
Speed score — derived from EWMA ping latency (faster = higher).
Capacity score — normalized GPU/compute benchmark.

The router also adapts K (how many experts to query) based on prompt complexity: simple questions → 1 expert, complex multi-domain prompts → 2-3 experts.

Why MoE over pipeline parallelism?

	Pipeline (serial)	MoE (parallel)
Latency	`sum(all stages)`	`max(1 expert)`
Failure	1 node down = broken	1 down = use others
Communication	Tensor serialization	Standard prompt/response
Scaling	More nodes = more hops	More nodes = more choices

System components

Expert Node — registers with the gateway with its model, domains, and hardware capabilities. Handles moe_expert_task messages by running Ollama and streaming chunks back.
Gateway — maintains the expert registry, runs the MoE router, dispatches to top-K experts in parallel, aggregates responses, and streams tokens to the user.
Router (inference/router.py) — classifies prompts, scores experts, selects top-K.
MoE Coordinator (inference/moe.py) — parallel dispatch, response aggregation (fastest / confidence / blend).
Ollama — local model runtime on each node.

Aggregation strategies

Strategy	Behavior	Best for
`fastest`	Lock onto whichever expert streams first	Lowest time-to-first-token
`confidence`	Collect all responses, pick highest confidence	Best quality
`blend`	Stream fastest, note disagreements	Balance of speed and quality

Network layer

EWMA latency tracking — exponentially weighted moving average for robust ping estimates.
Geographic awareness — Haversine distance between nodes, IP geolocation auto-detection.
Inter-node topology — RTT matrix for optimal routing decisions.

Request flow (MoE / reverse WS)

Expert nodes connect OUT ──WebSocket──► Gateway
User ──POST /chat──────────────────────► Gateway
  Gateway router selects top-K experts
  Gateway ──moe_expert_task──► Expert 1 (parallel)
  Gateway ──moe_expert_task──► Expert 2 (parallel)
  Expert 1 ──moe_expert_chunk──► Gateway ──stream──► User

APIs

Endpoint	Method	Description
`/nodes/register`	POST	Register a node (HTTP mode)
`/nodes/ws`	WebSocket	Reverse-connect a node (WAN mode)
`/nodes`	GET	List registered nodes
`/nodes/health`	POST	Ping all nodes
`/experts`	GET	List MoE experts with scores
`/experts/route`	POST	Preview routing for a prompt (dry-run)
`/chat`	POST	Send a prompt (streaming MoE)
`/health`	GET	Gateway health + MoE stats
`/`	GET	Dashboard
`/chat-ui`	GET	Chat UI

Tech stack

Python — FastAPI + httpx + uvicorn + websockets + numpy + psutil
Ollama — local LLM runtime
Protocol is HTTP/JSON + WebSocket; easy to reimplement in Go/Rust/Node later.

Manual setup

# Gateway
python3 -m venv .venv && source .venv/bin/activate
pip install -e .
microwave-gateway --host 0.0.0.0 --port 8000

# Expert node (LAN)
microwave-node --gateway-url http://GATEWAY:8000 --region LAN --model llama3.2 \
  --expert-domains general,code --host THIS_IP --port 9000

# Expert node (WAN)
microwave-node --gateway-url https://GATEWAY_URL --region US-EAST --model llama3.2 \
  --expert-domains math,science --reverse

Environment variables

Variable	Default	Description
`MICROWAVE_GATEWAY_URL`	—	Gateway address
`MICROWAVE_MODEL`	`llama3`	Model to run
`MICROWAVE_REGION`	`LAN`	Region label
`MICROWAVE_EXPERT_DOMAINS`	`general`	Comma-separated domains
`MICROWAVE_LAT` / `MICROWAVE_LON`	auto-detected	GPS coordinates
`MICROWAVE_ENGINE`	`ollama`	Inference engine
`MICROWAVE_DRAFT_MODELS`	—	Draft models for speculative decoding

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
microwave_ai		microwave_ai
scripts		scripts
.gitignore		.gitignore
README.md		README.md
install.ps1		install.ps1
install.sh		install.sh
pyproject.toml		pyproject.toml
run.sh		run.sh
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Microwave AI (Alpha)

Quickstart

Expert domains

How it works

Roadmap

Why decentralize?

Mixture of Experts (MoE)

Why MoE over pipeline parallelism?

System components

Aggregation strategies

Network layer

Request flow (MoE / reverse WS)

APIs

Tech stack

Manual setup

Environment variables

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Microwave AI (Alpha)

Quickstart

Expert domains

How it works

Roadmap

Why decentralize?

Mixture of Experts (MoE)

Why MoE over pipeline parallelism?

System components

Aggregation strategies

Network layer

Request flow (MoE / reverse WS)

APIs

Tech stack

Manual setup

Environment variables

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages