FDA Molecule Intelligence Agent

A local-first AI pipeline that extracts verified medical facts from FDA drug labels (PDFs) and indexes them into a searchable semantic knowledge base. It uses small, efficient LLMs (Gemma-2 2B) running on CPU to parse complex unstructured data into structured, cited intelligence.

🚀 Key Features

Local Privacy: Runs entirely offline using ollama. No data leaves your machine.
Dual-Engine Storage:
- Relational (SQLite): Stores high-fidelity facts, audit trails, and performance metrics.
- Semantic (ChromaDB): Enables conceptual search (e.g., "Find drugs with renal risks") using vector embeddings.
Hallucination Guardrails: A deterministic QuoteVerifier ensures every extracted fact is backed by an exact substring match in the source text.
Audit Trail: Every LLM thought, prompt, and latency metric is logged for full reproducibility.

🏗️ Architecture

The system follows a two-stage RAG (Retrieval-Augmented Generation) pipeline:

Extraction Layer: PDFs ⮕ LLM ⮕ SQLite (Facts + Quotes)
Semantic Layer: SQLite ⮕ Embedding Model (all-MiniLM-L6-v2) ⮕ ChromaDB

🛠️ Setup

Prerequisites
- Python 3.10+
- Ollama installed and running.
- Poetry (Python dependency manager).

Installation

# Install dependencies via Poetry
poetry install

# Pull the LLM
ollama pull gemma2:2b

Usage Flow

Step 1: Extract Facts

Process your PDFs to populate the SQLite audit store.
```
poetry run python -m src.main data/raw_pdfs/keytruda.pdf
```
Step 2: Build Knowledge Base (New)

Convert extracted facts into semantic vectors for searching.
```
poetry run python -m src.scripts.build_knowledge_base
```
Step 3: Semantic Query

Ask the agent conceptual questions.
```
poetry run python -m src.scripts.query_agent
```

🛠️ Tech Stack

Orchestration: Python 3.10 + Poetry
LLM: Gemma-2 2B (via Ollama)
Databases: SQLite (Audit/Facts), ChromaDB (Vector Store)
Embeddings: Sentence-Transformers (all-MiniLM-L6-v2)
UI: Rich (Terminal formatting)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
src		src
tests		tests
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
test_agent.py		test_agent.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FDA Molecule Intelligence Agent

🚀 Key Features

🏗️ Architecture

🛠️ Setup

🛠️ Tech Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FDA Molecule Intelligence Agent

🚀 Key Features

🏗️ Architecture

🛠️ Setup

🛠️ Tech Stack

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages