A privacy policy and terms of service analyzer. A browser extension (Firefox + Chrome) extracts page content, sends it to a Go backend that uses a RAG pipeline and LLMs to produce multi-dimensional privacy scores.
Browser Extension click
-> Content script extracts page HTML
-> Background worker POSTs to backend API
-> Backend: check cache (URL + content hash)
-> If cache miss:
HTML parse -> privacy policy detection -> text chunking
-> Embed chunks (OpenAI) -> store in vector store (in-memory default, Qdrant optional)
-> Retrieve relevant chunks -> LLM analysis (Anthropic Claude)
-> Structured scoring -> cache result -> return response
Five dimensions, equally weighted (20% each), rated 1-10 (higher = better for user privacy):
| Dimension | What It Measures |
|---|---|
| Data Collection | How much data is collected and whether it's minimized |
| Data Sharing | Whether data is shared/sold to third parties |
| User Rights | Access, deletion, portability, opt-out rights |
| Retention | How long data is kept and whether limits are defined |
| Security | Encryption, breach notification, security practices |
Risk Levels: Low (8-10), Moderate (5-7.9), High (3-4.9), Critical (1-2.9)
| Method | Path | Description |
|---|---|---|
POST |
/api/v1/analyze |
Submit HTML content for privacy analysis |
GET |
/api/v1/health |
Health check (backend + vector store status) |
- Go 1.22+ (for running the backend directly)
- Docker + Docker Compose or Podman + Podman Compose (for containerized setup)
- Anthropic API key (for LLM analysis)
- OpenAI API key (for text embeddings)
smolterms/
├── backend/
│ ├── cmd/server/main.go # Application entrypoint
│ ├── Dockerfile # Multi-stage Docker build
│ └── internal/
│ ├── analyzer/ # Full pipeline orchestration, scoring
│ ├── api/ # HTTP handlers, middleware, routing
│ ├── cache/ # Cache interface + in-memory implementation
│ ├── config/ # Environment variable loading
│ ├── embedding/ # EmbeddingClient interface + OpenAI impl
│ ├── extractor/ # HTML parsing, chunking, policy detection
│ ├── integration/ # End-to-end integration tests
│ ├── llm/ # LLMClient interface + Anthropic impl
│ ├── rag/ # RAG pipeline (store + retrieve)
│ ├── types/ # Shared request/response types
│ └── vectorstore/ # VectorStore interface + Qdrant impl
├── extension/ # Browser extension (Firefox + Chrome)
├── docker-compose.yml # Local dev: backend (Qdrant via --profile qdrant)
├── .env.example # Environment variable template
├── go.mod
└── go.sum
| Component | Technology |
|---|---|
| Backend | Go 1.22+, stdlib net/http |
| LLM | Anthropic Claude Sonnet 4.5 |
| Embeddings | OpenAI text-embedding-3-small (1536 dims) |
| Vector Store | In-memory (default), Qdrant gRPC (optional) |
| Caching | go-cache (in-memory) |
| Configuration | Environment variables (12-factor) |
| Extension | Vanilla JS, Manifest V3 |
TBD