SGTradeClassificationRagBot

A research prototype implementing a retrieval-augmented generation (RAG) workflow for classifying Singapore trade-related documents. The project contains a lightweight RAG tool, a simple "naive" classification agent, prompt & model utilities, and focused evaluation utilities to measure classifier performance.

Key Features

Modular Architecture: A production-ready Python codebase with clear separation of concerns (Agents, Tools, Parsers, Evaluation).
Containerized Workflow: A fully working Dockerfile and docker-compose setup for reproducible testing.
Evaluation-First: Integrated promptfoo configuration for rigorous testing of prompts against trade scenarios.
Auditor’s Log: A trace of the agent’s Chain of Thought (CoT) for the test cases, showing how it handled ambiguity.

Project Structure

src/sg_trade_ragbot/agents/naive_agent.py — A simple classification agent used as a baseline.
src/sg_trade_ragbot/tools/RAGTool.py — RAG tooling: document ingestion, retrieval, and context assembly.
src/sg_trade_ragbot/utils/prompts/prompts.py — Prompt templates used by agents and tools.
src/sg_trade_ragbot/utils/models/models.py — Model wrappers and helpers.
src/sg_trade_ragbot/utils/evals/ — Evaluation configuration (e.g., bare_config.yaml) and utilities.
tests/ — Unit tests for agents, tools, and utils.

Goals

Reproducible Pipeline: Provide a containerized RAG pipeline for classifying trade text.
Baseline Benchmarks: Offer simple agents to evaluate different retrieval and prompting strategies.
Iteration: Make it easy to swap prompts, plug in different LLMs (OpenAI, Groq, etc.), and measure impact.

Requirements

Docker Desktop (Recommended)
uv (Optional, for local dependency management)
Python 3.13+ (If running locally without Docker)

Quick Start (Docker)

This project is containerized to ensure consistent evaluations across different machines. It uses uv for dependency management and mounts configuration files so you can edit test cases without rebuilding the container.

1. Setup Configuration

The container requires API keys to function.

Copy the example environment file:
cp .env.example .env
Open .env and add your keys (e.g., OPENAI_API_KEY, GROQ_API_KEY).Note: Do not add file paths to .env. The container handles paths automatically.

2. Running an Evaluation

To run the promptfoo evaluation against the default configuration:
docker compose up --build

This will:

Build the image (installing all dependencies from uv.lock).
Run the evaluation script.
Print the results to your terminal.

3. The "Live Edit" Workflow

You do not need to rebuild the container to modify prompts or test cases.

Open src/sg_trade_ragbot/utils/evals/eval_configs/bare_config.yaml in your local editor.
Modify your prompts, test cases, or variables.
Save the file.
Run docker compose up again.
- The container sees your changes immediately via Docker volumes.

4. Managing Dependencies

If you add a new library (e.g., spacy), you must rebuild the container for Docker to see it:
# 1. Update lockfile locally
uv add spacy

# 2. Rebuild container
docker compose up --build

Known Issues & Roadmap

The Auditor’s Log (Chain of Thought)

The system generates a trace of the agent’s Chain of Thought (CoT) to show how it handles ambiguity in trade documents.

Current Status: These traces are currently visible in the promptfoo debug logs/container output.
Todo: Implement a structured export or cleaner visualization for the Auditor's Log in the final report.

Local Ollama Support

Current Status: The Docker configuration currently relies on external APIs (OpenAI, Groq). Local Ollama instances running on the host machine are not yet bridgeable to the container network in this release.
Todo: Add a dedicated Ollama service to docker-compose.yml for fully offline, local model evaluation.

Development Notes

Prompts: Tweak templates in utils/prompts to change agent behavior.
Models: Wrappers in utils/models abstract the LLM/embedding implementations. You can swap in your preferred LLM client by implementing the required interface.
Testing: Tests live in tests/ and use pytest. Run them frequently during development.

TODO

[] fix retrieval json parsing
[] fix chunking to be smaller and more efficient

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
data/raw		data/raw
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yaml		docker-compose.yaml
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SGTradeClassificationRagBot

Key Features

Project Structure

Goals

Requirements

Quick Start (Docker)

1. Setup Configuration

2. Running an Evaluation

3. The "Live Edit" Workflow

4. Managing Dependencies

Known Issues & Roadmap

The Auditor’s Log (Chain of Thought)

Local Ollama Support

Development Notes

TODO

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SGTradeClassificationRagBot

Key Features

Project Structure

Goals

Requirements

Quick Start (Docker)

1. Setup Configuration

2. Running an Evaluation

3. The "Live Edit" Workflow

4. Managing Dependencies

Known Issues & Roadmap

The Auditor’s Log (Chain of Thought)

Local Ollama Support

Development Notes

TODO

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages