Skip to content

Parth576/smolterms

Repository files navigation

SmolTerms

A privacy policy and terms of service analyzer. A browser extension (Firefox + Chrome) extracts page content, sends it to a Go backend that uses a RAG pipeline and LLMs to produce multi-dimensional privacy scores.

How It Works

Browser Extension click
  -> Content script extracts page HTML
  -> Background worker POSTs to backend API
  -> Backend: check cache (URL + content hash)
  -> If cache miss:
       HTML parse -> privacy policy detection -> text chunking
       -> Embed chunks (OpenAI) -> store in vector store (in-memory default, Qdrant optional)
       -> Retrieve relevant chunks -> LLM analysis (Anthropic Claude)
       -> Structured scoring -> cache result -> return response

Scoring System

Five dimensions, equally weighted (20% each), rated 1-10 (higher = better for user privacy):

Dimension What It Measures
Data Collection How much data is collected and whether it's minimized
Data Sharing Whether data is shared/sold to third parties
User Rights Access, deletion, portability, opt-out rights
Retention How long data is kept and whether limits are defined
Security Encryption, breach notification, security practices

Risk Levels: Low (8-10), Moderate (5-7.9), High (3-4.9), Critical (1-2.9)

API Endpoints

Method Path Description
POST /api/v1/analyze Submit HTML content for privacy analysis
GET /api/v1/health Health check (backend + vector store status)

Prerequisites

  • Go 1.22+ (for running the backend directly)
  • Docker + Docker Compose or Podman + Podman Compose (for containerized setup)
  • Anthropic API key (for LLM analysis)
  • OpenAI API key (for text embeddings)

Project Structure

smolterms/
├── backend/
│   ├── cmd/server/main.go          # Application entrypoint
│   ├── Dockerfile                   # Multi-stage Docker build
│   └── internal/
│       ├── analyzer/                # Full pipeline orchestration, scoring
│       ├── api/                     # HTTP handlers, middleware, routing
│       ├── cache/                   # Cache interface + in-memory implementation
│       ├── config/                  # Environment variable loading
│       ├── embedding/               # EmbeddingClient interface + OpenAI impl
│       ├── extractor/               # HTML parsing, chunking, policy detection
│       ├── integration/             # End-to-end integration tests
│       ├── llm/                     # LLMClient interface + Anthropic impl
│       ├── rag/                     # RAG pipeline (store + retrieve)
│       ├── types/                   # Shared request/response types
│       └── vectorstore/             # VectorStore interface + Qdrant impl
├── extension/                       # Browser extension (Firefox + Chrome)
├── docker-compose.yml               # Local dev: backend (Qdrant via --profile qdrant)
├── .env.example                     # Environment variable template
├── go.mod
└── go.sum

Tech Stack

Component Technology
Backend Go 1.22+, stdlib net/http
LLM Anthropic Claude Sonnet 4.5
Embeddings OpenAI text-embedding-3-small (1536 dims)
Vector Store In-memory (default), Qdrant gRPC (optional)
Caching go-cache (in-memory)
Configuration Environment variables (12-factor)
Extension Vanilla JS, Manifest V3

License

TBD

About

Summarize & Understand ToC and privacy policies before accepting them

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors