Cut LLM costs by ~10–30%—just by changing orchestration.
This project demonstrates how orchestration strategy—not model choice—drives LLM cost and efficiency.
It compares how different pipeline designs affect:
- 🔢 Token usage
- 💰 Cost
- ⚡ Efficiency
- 🧵 Weft-style orchestration
- 🔁 Map-Reduce orchestration
- 📦 Python full-buffer baseline (control)
👉 Same input. Same model.
👉 Only orchestration changes.
Courtesy: https://github.com/WeaveMindAI/weft
Weft-style orchestration focuses on:
- Passing only the data needed at each step
- Avoiding repeated context sharing
- Using structured, minimal data flow
💡 In this project, we simulate Weft-style orchestration in Python to demonstrate its impact on token efficiency.
Most LLM pipelines are inefficient because they:
- ❌ Re-send the same context repeatedly
- ❌ Grow token usage at every step
- ❌ Increase cost without improving output
- Sending only relevant data
- Avoiding full-buffer context passing
- Reducing unnecessary token duplication
- Input / Output / Total tokens
- Estimated cost per approach
- ✅ Tokens saved (vs baseline)
- ✅ Cost saved
- 🚀 % cost reduction (Weft vs Map-Reduce)
-
Runs three orchestration pipelines:
- Full-buffer baseline
- Map-Reduce
- Weft-style
-
Measures:
- Input tokens
- Output tokens
- Total tokens
-
Computes:
- Cost using
configs/pricing.yaml - Absolute savings
- Percentage (%) reduction
- Cost using
-
Displays everything in a side-by-side comparison UI
python scripts.py seed-demo-data
python scripts.py benchmark --model claude-3-5-sonnet
python backend/main.pyOpen:
Click Try Live Demo to launch the main UI.
For deeper understanding of the system design:
- 🧠 Orchestration Design — explains how Weft-style and baseline pipelines are structured
- 📄 Design Goals — outlines the objectives, constraints, and comparison methodology
- 🔥 Context duplication is the real cost driver
- 🧠 Pipeline design directly impacts token usage
- ⚡ Structured data flow > raw text passing
- 💰 Reduce cost without changing models
- 📊 Measure efficiency using tokens, cost, and % reduction
Applies to:
- RAG pipelines
- Agents
- Multi-step workflows
- Tool-using systems
❌ “Which model is cheaper?”
✅ “Why am I sending so much data?”
Better orchestration beats better models (for cost). The cheapest token is the one you never send.
