Skip to content

PreethaRaj/TokenWeaver-Lab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚀 Weft vs Traditional Orchestration — Token & Cost Efficiency Demo

Cut LLM costs by ~10–30%—just by changing orchestration.


🎬 Demo

Weft Demo


🧠 What this project does

This project demonstrates how orchestration strategy—not model choice—drives LLM cost and efficiency.

It compares how different pipeline designs affect:

  • 🔢 Token usage
  • 💰 Cost
  • ⚡ Efficiency

Compared approaches:

  • 🧵 Weft-style orchestration
  • 🔁 Map-Reduce orchestration
  • 📦 Python full-buffer baseline (control)

👉 Same input. Same model.
👉 Only orchestration changes.


🌐 About Weft

Courtesy: https://github.com/WeaveMindAI/weft

Weft-style orchestration focuses on:

  • Passing only the data needed at each step
  • Avoiding repeated context sharing
  • Using structured, minimal data flow

💡 In this project, we simulate Weft-style orchestration in Python to demonstrate its impact on token efficiency.


💥 Why this matters

Most LLM pipelines are inefficient because they:

  • ❌ Re-send the same context repeatedly
  • ❌ Grow token usage at every step
  • ❌ Increase cost without improving output

🧵 Weft solves this by:

  • Sending only relevant data
  • Avoiding full-buffer context passing
  • Reducing unnecessary token duplication

📊 What the UI shows

  • Input / Output / Total tokens
  • Estimated cost per approach
  • Tokens saved (vs baseline)
  • Cost saved
  • 🚀 % cost reduction (Weft vs Map-Reduce)

⚙️ How it works

  1. Runs three orchestration pipelines:

    • Full-buffer baseline
    • Map-Reduce
    • Weft-style
  2. Measures:

    • Input tokens
    • Output tokens
    • Total tokens
  3. Computes:

    • Cost using configs/pricing.yaml
    • Absolute savings
    • Percentage (%) reduction
  4. Displays everything in a side-by-side comparison UI


▶️ Run locally

python scripts.py seed-demo-data
python scripts.py benchmark --model claude-3-5-sonnet
python backend/main.py

Open:

http://127.0.0.1:8000


🌐 Landing Page

http://127.0.0.1:8000

Click Try Live Demo to launch the main UI.


📚 Documentation

For deeper understanding of the system design:

  • 🧠 Orchestration Design — explains how Weft-style and baseline pipelines are structured
  • 📄 Design Goals — outlines the objectives, constraints, and comparison methodology

🎯 What you learn from this project

👉 LLM cost is an orchestration problem, not just a model problem.

  • 🔥 Context duplication is the real cost driver
  • 🧠 Pipeline design directly impacts token usage
  • ⚡ Structured data flow > raw text passing
  • 💰 Reduce cost without changing models
  • 📊 Measure efficiency using tokens, cost, and % reduction

🏭 From demo → production

Applies to:

  • RAG pipelines
  • Agents
  • Multi-step workflows
  • Tool-using systems

Production mindset shift:

❌ “Which model is cheaper?”
✅ “Why am I sending so much data?”


💡 Final takeaway

Better orchestration beats better models (for cost). The cheapest token is the one you never send.

About

Cut LLM costs by 10–30% using better orchestration—not model changes. A visual demo comparing Weft-style pipelines vs map-reduce and full-buffer baselines, showing token usage, cost savings, and % reduction.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors