OpenSearch + RAG Workshop

Build a semantic search & RAG system over Vector Podcast transcripts

What you'll build

A fully working Retrieval-Augmented Generation (RAG) pipeline that lets you ask natural-language questions and get answers grounded in content from 33 podcast episode transcripts about vector search and AI.

Your Question
     │
     ▼
[Embed with sentence-transformers]
     │
     ▼
[k-NN search on OpenSearch]  ←  33 podcast episodes, chunked + embedded
     │
     ▼
[Generate answer with Ollama]
     │
     ▼
Grounded Answer + Episode Citations

Architecture

Layer	Technology	Purpose
Vector store	OpenSearch 2.13 (k-NN plugin)	Store & search 384-dim embeddings
Embeddings	`all-MiniLM-L6-v2` (sentence-transformers)	Convert text → vectors
Index type	HNSW with cosine similarity	Approximate nearest-neighbour search
LLM	Ollama (local, e.g. `gemma4`)	Generate grounded answers
Data	33 Vector Podcast transcripts (Whisper)	Knowledge base

Prerequisites

Docker + Docker Compose
Python 3.10+
uv
Ollama installed and running locally

Setup (5 minutes)

1. Start OpenSearch

cd workshop/
docker compose up -d

Wait ~30 seconds, then verify: http://localhost:9200 You should see {"status": "green" ...}.

OpenSearch Dashboards (optional UI): http://localhost:5601

2. Install Python dependencies

uv sync

3. Pull an Ollama model

ollama pull gemma4

You can use any Ollama model. Set OLLAMA_MODEL to override the default:

export OLLAMA_MODEL=gemma4    # default
export OLLAMA_HOST=http://localhost:11434  # default

4. Launch Jupyter

jupyter notebook notebooks/

Workshop Flow

Part 1: Index Podcasts `01_index_podcasts.ipynb` (~15 min)

Step	What happens
Connect	Verify OpenSearch is running
Create index	k-NN index with HNSW + cosine similarity, 384-dim
Parse	Load 33 `.md` files, extract frontmatter + transcript body
Chunk	Split transcripts into ~400-word windows with 50-word overlap
Embed	Generate sentence embeddings with `all-MiniLM-L6-v2`
Index	Bulk-ingest all chunks into OpenSearch
Verify	Run a test k-NN query

Part 2: RAG Pipeline `02_rag_pipeline.ipynb` (~15 min)

Step	What happens
Semantic search	k-NN query finds meaning, not just keywords
Hybrid search	BM25 + k-NN combined for better term coverage
RAG pipeline	Retrieve → format context → generate with Ollama
Inspect context	See exactly what goes into the LLM prompt
Explore	Ask your own questions!

Advanced Notebooks (go-further, pick any)

03: Cross-Encoder Re-Ranking `03_reranking.ipynb`

Two-stage retrieval: bi-encoder retrieves top-20 candidates, then a cross-encoder (ms-marco-MiniLM-L-6-v2) re-scores them for much higher precision before sending to the LLM. Side-by-side RAG comparison shows the impact on answer quality.

04: Metadata Filtering `04_metadata_filtering.ipynb`

Scope k-NN search by episode title, date range, or any structured field. Covers OpenSearch post-filtering, pre-filtering trade-offs, and the over-fetch pattern.

05: Larger Embeddings `05_larger_embeddings.ipynb`

Swap all-MiniLM-L6-v2 (384d) for all-mpnet-base-v2 (768d) and compare: search quality, encoding speed, and index size side by side on the same data.

06: Aiven Managed OpenSearch `06_aiven_opensearch.ipynb`

Connect to an Aiven for OpenSearch cloud cluster over TLS. Create the same index, bulk-index podcast chunks, and run RAG queries against a fully managed service.

07: Streamlit Chat UI

Launch streamlit_app.py. A streaming chat interface with source citations, sidebar controls (search mode, top-k), and persistent chat history.

cd workshop/
streamlit run streamlit_app.py    # opens http://localhost:8501

Project Structure

workshop/
├── README.md
├── docker-compose.yml              # OpenSearch + Dashboards
├── requirements.txt
├── streamlit_app.py                # Streamlit chat UI
├── notebooks/
│   ├── 01_index_podcasts.ipynb     # Core: parse, embed, index
│   ├── 02_rag_pipeline.ipynb       # Core: search + RAG
│   ├── 03_reranking.ipynb          # Advanced: cross-encoder re-ranking
│   ├── 04_metadata_filtering.ipynb # Advanced: filter by episode/date
│   ├── 05_larger_embeddings.ipynb  # Advanced: 768-dim embeddings
│   └── 06_aiven_opensearch.ipynb   # Advanced: managed cloud cluster
└── src/
    ├── parser.py                   # Markdown → PodcastChunk objects
    ├── opensearch_client.py        # Index management, search, Aiven client
    ├── rag.py                      # RAG pipeline using Ollama SDK
    └── reranker.py                 # Cross-encoder re-ranking

Key Concepts

k-NN Index in OpenSearch

{
  "settings": { "index.knn": true },
  "mappings": {
    "properties": {
      "embedding": {
        "type": "knn_vector",
        "dimension": 384,
        "method": {
          "name": "hnsw",
          "space_type": "cosinesimil",
          "engine": "nmslib"
        }
      }
    }
  }
}

HNSW: Hierarchical Navigable Small World

HNSW builds a multi-layer graph where each node connects to its nearest neighbours. Search starts at the top (sparse) layer and zooms into the bottom (dense) layer, pruning irrelevant branches early, achieving sub-millisecond search over millions of vectors.

Parameters:

ef_construction: graph quality during indexing (higher = better recall, slower build)
m: max connections per node (higher = better recall, more memory)
ef_search: candidates explored at query time (higher = better recall, slower search)

Chunking Strategy

Long podcast transcripts are split into overlapping windows:

chunk_size = 400 words enough context for a coherent idea
overlap = 50 words ensures ideas spanning chunk boundaries are captured

RAG Prompt Design

The system prompt constrains the LLM to:

Answer only from provided context (no hallucination)
Cite the episode title for specific claims
Acknowledge when context is insufficient

Sample Questions to Try

"What is HNSW and why did Yury Malkov invent it?"
"How do wormhole vectors differ from standard hybrid search?"
"What are the main challenges of running vector search in production?"
"How does Pinecone's architecture differ from Weaviate's?"
"What advice do guests give for evaluating embedding model quality?"
"What is the role of sparse vectors in hybrid search?"

Teardown

docker compose down          # stop containers
docker compose down -v       # stop + delete the index data

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
notebooks		notebooks
src		src
vector-podcast		vector-podcast
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
streamlit_app.py		streamlit_app.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenSearch + RAG Workshop

Build a semantic search & RAG system over Vector Podcast transcripts

What you'll build

Architecture

Prerequisites

Setup (5 minutes)

1. Start OpenSearch

2. Install Python dependencies

3. Pull an Ollama model

4. Launch Jupyter

Workshop Flow

Part 1: Index Podcasts `01_index_podcasts.ipynb` (~15 min)

Part 2: RAG Pipeline `02_rag_pipeline.ipynb` (~15 min)

Advanced Notebooks (go-further, pick any)

03: Cross-Encoder Re-Ranking `03_reranking.ipynb`

04: Metadata Filtering `04_metadata_filtering.ipynb`

05: Larger Embeddings `05_larger_embeddings.ipynb`

06: Aiven Managed OpenSearch `06_aiven_opensearch.ipynb`

07: Streamlit Chat UI

Project Structure

Key Concepts

k-NN Index in OpenSearch

HNSW: Hierarchical Navigable Small World

Chunking Strategy

RAG Prompt Design

Sample Questions to Try

Teardown

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OpenSearch + RAG Workshop

Build a semantic search & RAG system over Vector Podcast transcripts

What you'll build

Architecture

Prerequisites

Setup (5 minutes)

1. Start OpenSearch

2. Install Python dependencies

3. Pull an Ollama model

4. Launch Jupyter

Workshop Flow

Part 1: Index Podcasts 01_index_podcasts.ipynb (~15 min)

Part 2: RAG Pipeline 02_rag_pipeline.ipynb (~15 min)

Advanced Notebooks (go-further, pick any)

03: Cross-Encoder Re-Ranking 03_reranking.ipynb

04: Metadata Filtering 04_metadata_filtering.ipynb

05: Larger Embeddings 05_larger_embeddings.ipynb

06: Aiven Managed OpenSearch 06_aiven_opensearch.ipynb

07: Streamlit Chat UI

Project Structure

Key Concepts

k-NN Index in OpenSearch

HNSW: Hierarchical Navigable Small World

Chunking Strategy

RAG Prompt Design

Sample Questions to Try

Teardown

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Part 1: Index Podcasts `01_index_podcasts.ipynb` (~15 min)

Part 2: RAG Pipeline `02_rag_pipeline.ipynb` (~15 min)

03: Cross-Encoder Re-Ranking `03_reranking.ipynb`

04: Metadata Filtering `04_metadata_filtering.ipynb`

05: Larger Embeddings `05_larger_embeddings.ipynb`

06: Aiven Managed OpenSearch `06_aiven_opensearch.ipynb`

Packages