Build once. Load anywhere. Query forever.
Problem · The .rag Format · Install · Quickstart · Providers · Roadmap
Every major ML format is portable by default:
model.pt · model.onnx · model.gguf · model.h5
You save them, share them, deploy them anywhere.
RAG systems can't do any of that.
A typical RAG pipeline is a fragile web of moving parts:
❌ Vector databases tied to specific infrastructure
❌ Embedding pipelines that must be rebuilt from scratch
❌ Chunking configs scattered across codebases
❌ Provider-specific integrations with zero portability
❌ Metadata that lives nowhere and everywhere at once
Every time you switch environments — laptop to server, dev to prod, team to team — you rebuild the whole thing. That's broken.
RagBucket fixes this.
It packages your entire RAG pipeline — vectors, chunks, config, and runtime metadata — into a single portable .rag artifact. Like a model checkpoint, but for retrieval intelligence.
A .rag artifact is a self-contained, executable unit of retrieval intelligence. It is not a config file. It is not a directory. It is a complete, ready-to-run retrieval system.
| What it stores | How it stores it |
|---|---|
| Semantic embeddings | via Sentence Transformers |
| Vector index | via FAISS |
| Chunked knowledge | via LangChain splitters |
| Retrieval configuration | embedded in manifest |
| Runtime metadata | versioned artifact manifest |
Build it once.
Drop it anywhere.
Query it with one line of code.
# Using uv (recommended)
uv add ragbucket
# Using pip
pip install ragbucketLightweight by default. Local embedding dependencies are only pulled in when you set
embedding_provider="local". Cloud providers add nothing to your base install.
from ragbucket import RagBuilder, RagConfig
import os
from dotenv import load_dotenv
load_dotenv()
config = RagConfig(
# ── Embedding Provider ────────────────────────────────────
embedding_provider = "cohere",
embedding_model = "embed-english-v3.0",
embedding_api_key = os.getenv("COHERE_API_KEY"),
# ── Chunking ──────────────────────────────────────────────
chunk_size = 512,
chunk_overlap = 50,
# ── Retrieval ─────────────────────────────────────────────
top_k = 3,
)
builder = RagBuilder(config=config)
builder.build(
doc_path = "docs/",
op_path = "artifacts/demo.rag",
)This generates a single portable artifact:
artifacts/
└── demo.rag ← your entire RAG pipeline, packaged
Containing:
demo.rag
├── vectors.faiss ← semantic vector index
├── chunks.json ← chunked document memory
└── manifest.json ← embedding config + metadata
Build once. Query anywhere.
from ragbucket import RagRuntime
import os
from dotenv import load_dotenv
load_dotenv()
rag = RagRuntime(
# ── RAG Artifact ──────────────────────────────────────────
rag_path = "artifacts/demo.rag",
# ── Generation Provider ───────────────────────────────────
provider = "groq",
api_key = os.getenv("GROQ_API_KEY"),
model = "llama-3.1-8b-instant",
# ── Embedding Provider Key ────────────────────────────────
embedding_api_key = os.getenv("COHERE_API_KEY"),
# ── System Prompt ─────────────────────────────────────────
system_prompt = "You are a helpful assistant. Keep answers short and crisp.",
)
response = rag.ask("What are Anik's AI/ML skills?")
print(response)That's it. No vector DB to spin up. No pipeline to reconstruct. Just load and ask.
RagBucket cleanly separates retrieval from generation — meaning you can mix and match embedding providers with generation providers freely.
| Provider | Example Model |
|---|---|
groq |
llama-3.1-8b-instant |
openai |
gpt-4o-mini |
gemini |
gemini-1.5-flash |
anthropic |
claude-3-haiku-20240307 |
# Swap providers without touching anything else
rag = RagRuntime(
rag_path = "demo.rag",
provider = "anthropic",
api_key = os.getenv("ANTHROPIC_API_KEY"),
model = "claude-3-haiku-20240307",
embedding_api_key = os.getenv("COHERE_API_KEY"),
)| Provider | Example Model |
|---|---|
local |
BAAI/bge-small-en-v1.5 |
cohere |
embed-english-v3.0 |
openai |
text-embedding-3-small |
gemini |
models/embedding-001 |
voyage |
voyage-large-2 |
# Use any embedding provider at build time
config = RagConfig(
embedding_provider = "openai",
embedding_model = "text-embedding-3-small",
embedding_api_key = os.getenv("OPENAI_API_KEY"),
)Every stage of the retrieval pipeline is configurable. Sane defaults are always applied automatically.
from ragbucket import RagConfig
config = RagConfig(
# Embedding system
embedding_provider = "local",
embedding_model = "sentence-transformers/all-MiniLM-L6-v2",
# Chunking
chunk_size = 1024,
chunk_overlap = 100,
# Retrieval
top_k = 5,
)All missing values are filled using framework defaults. Nothing breaks if you leave something out.
demo.rag
│
├── vectors.faiss ← FAISS vector index (semantic search backbone)
├── chunks.json ← document chunks with source metadata
└── manifest.json ← embedding config, top_k, model info, version
The artifact is entirely self-describing. Anyone who receives a .rag file has everything needed to query it — no external config, no infrastructure dependencies, no guesswork.
| Component | Technology |
|---|---|
| Embeddings | Sentence Transformers |
| Vector Search | FAISS |
| Chunking | LangChain Text Splitters |
| Artifact Packaging | Python zipfile |
| Config Validation | Pydantic |
| Runtime | Pure Python |
RAG systems should be as portable as model files. Not as fragile as microservice stacks.
RagBucket treats RAG systems as portable intelligence artifacts — not fragile infrastructure pipelines. This cleanly separates two concerns that have no business being coupled:
Retrieval memory → what you built → lives in the .rag file
Language generation → how you query it → any provider, any environment
Your retrieval knowledge travels with your code. Swap generation providers without rebuilding anything. Share a .rag file like you'd share a model checkpoint.
The result: reusable semantic memory that is fully decoupled from infrastructure.
| Resource | URL |
|---|---|
| Website | ragbucket.vercel.app |
| PyPI | pypi.org/project/ragbucket |
| GitHub | github.com/anikchand461/ragbucket |
MIT License — see LICENSE for details.
◈ RagBucket
The portable runtime layer for Retrieval-Augmented Generation systems.
Built by Anik Chand · ragbucket.vercel.app

