This repository is a learning and demonstration project. Treat the findings below as a threat model and hardening checklist, not as certification.
If you discover a security issue in this codebase, please report it privately to the maintainer when sensitive, or open a public issue for non-sensitive items (documentation-only, etc.).
Mappings are qualitative (code inspection). Earlier revisions used OWASP Top 10 (2021) numbering; this document is maintained against OWASP Top 10:2025. Dependency advisories change continuously—run pip-audit (see CI and commands below), Dependabot, or uv.lock + OSV regularly.
Findings
tenant_id/source_type/metadata_filterare optional filters onPOST /retrieve— omitting them returns hits across all rows (src/rag/main.py). WhenREQUIRE_TENANT_ID=true,tenant_idbecomes mandatory (400 if missing).- Any authenticated caller (when
RAG_API_KEY/API_KEYis set) shares one global secret — no RBAC, no per-tenant API keys, no JWT scopes. With no API key configured, every route exceptGET /healthandGET /readyremains anonymous full access. GET /healthandGET /readyare always unauthenticated (for probes);/readydoes not grant data access but confirms DB/pgvector/table readiness.- Ingest can overwrite chunks by
(doc_id, chunk_index)for any declared tenant string — multi-tenancy is client-declared, not cryptographically enforced. PATCH /config/runtime-search,POST /tuner/*,POST /telemetry/ingest-backlog/clearaffect global tuning/telemetry state for the process.- OpenAPI
/docsand/redocexpose the attack surface — disable withDISABLE_OPENAPI_UI=truewhen exposing the API beyond trusted networks.
Mitigations (production-oriented)
- Put the API behind an API gateway / reverse proxy with mTLS or OAuth2/JWT, per-route scopes, and network policies.
- Enforce tenant isolation (required tenant context + server-side checks, or PostgreSQL row-level security).
- Disable or protect
/docson untrusted networks.
Findings
- Postgres published on host port 5433 in
docker-compose.yml— avoid unintended exposure beyond localhost. - Default DB credentials (
rag/rag) are documented for dev — not for the internet. - Optional CORS allowlist via
CORS_ORIGINS(comma-separated). When unset, FastAPI defaults apply. - Optional in-process rate limit via
RATE_LIMIT_PER_MINUTE(per client IP, single worker only). For real deployments prefer an API gateway or reverse-proxy token bucket (see snippet below). - Positive control:
GET /readychecks connectivity,vectorextension, andpublic.chunks— helps orchestrators avoid routing traffic before migrations (does not replace auth).
Mitigations
- Bind services to internal networks in real deployments; do not expose Postgres publicly.
- Add rate limiting (proxy or middleware), security headers, and explicit CORS when browser clients exist. Example nginx zones (adjust zones and TLS separately):
limit_req_zone $binary_remote_addr zone=rag_ingest:10m rate=10r/s;
limit_req_zone $binary_remote_addr zone=rag_retrieve:10m rate=30r/s;
location /ingest/ {
limit_req zone=rag_ingest burst=20 nodelay;
proxy_pass http://127.0.0.1:8000;
}
location /retrieve {
limit_req zone=rag_retrieve burst=50 nodelay;
proxy_pass http://127.0.0.1:8000;
}Caddy (rate limiting needs a plugin or separate layer; example shows TLS reverse proxy to the API):
rag.example.com {
encode gzip
reverse_proxy 127.0.0.1:8000
}Traefik (Docker labels or file provider; example static routes.yml fragment):
http:
routers:
rag:
rule: PathPrefix(`/`)
service: rag-api
entryPoints:
- websecure
tls: {}
services:
rag-api:
loadBalancer:
servers:
- url: "http://host.docker.internal:8000"Tune entryPoints, tls, and servers for your network; add middleware rate limits in Traefik v3 if needed.
- Use secrets management and strong credentials outside demos.
Findings
- Dependencies are pinned in
uv.lock; disclosed CVEs can appear at any time. - Docker images should use pinned bases and be scanned (Trivy, Grype, etc.) before production deploy—recommended outside this minimal CI.
Mitigations
- CI runs
pip-auditon exported locked runtime dependencies and Bandit onsrc/ragandscripts(see.github/workflows/ci.yml). - Enable Dependabot or Renovate; rebuild images after upgrades.
Example (matches CI; omit process substitution for portability):
uv sync --group dev
uv export --frozen --no-dev --no-emit-project --no-hashes -o /tmp/deps-audit.txt
uv run pip-audit -r /tmp/deps-audit.txtFindings
DATABASE_URLmay contain plain-text credentials (.env, Compose env).- No TLS in the app—terminate HTTPS at the edge if needed.
- No encryption-at-rest configured here (use disk encryption / managed DB).
Mitigations
- Store secrets outside git; rotate passwords.
sslmode=require/verify-fullfor Postgres clients where supported.
Findings
- SQL: Ingest uses parameterized
executemany; retrieve uses a fixed SQL shell with bound parameters for filters. Embedding literals come from server-side embedding of user text, not concatenated SQL from raw input. - YAML:
yaml.safe_load(src/rag/config_loader.py). scripts/migrate.pydoes not invoke shells with user-controlled strings.
Residual risks
POST /ingest/chunksbatch size is capped byMAX_INGEST_CHUNKS_PER_REQUEST(default 500, HTTP 413) — reduces memory/DB abuse vs unbounded batches; largecontentstrings per row can still stress resources unless further bounded.
Mitigations
- Add Pydantic
max_lengthon text fields if you need tighter bounds.
Findings
- Global in-memory tuner / telemetry — callers affect shared process state.
- Demo embeddings are deterministic hashes — not a security boundary for semantic secrecy.
- SSRF: The app does not fetch user-supplied URLs today.
Watchouts
- Future features such as “fetch URL and embed” need strict URL validation (scheme/host allowlists, block RFC1918/metadata URLs).
Mitigations
- Separate admin/tuning plane from data plane in production; quotas, auditing, real identity model.
Findings
- With
RAG_API_KEY/API_KEYunset (demo default), there are no sessions or per-user credentials — full anonymous access except probe routes. - With API key set, clients may send
Authorization: Bearer <token>orX-API-Keyalone. If the header uses the Bearer prefix, only that token is validated (including rejecting an empty token);X-API-Keyis ignored in that case — no OAuth2, no MFA, no rotation story in-app.
Mitigations
- Prefer gateway-managed OAuth2/JWT, mTLS, or short-lived tokens for production; treat the env-based shared secret as a minimal stub only.
Findings
- No unsigned auto-update or arbitrary deserialization paths in core handlers.
Mitigations
- Sign releases / pin image digests; extend CI with container scanners when you publish images.
Findings
- Operational telemetry exists for ingest/retrieve (
src/rag/telemetry.py). - 401 responses from optional API-key middleware give a minimal auth signal but no structured security audit stream or SIEM integration.
Mitigations
- Log identity, policy decisions, and anomalies to your logging stack.
Findings
POST /ingest/chunksmaps database failures toHTTPException(detail=str(exc))— may leak internal errors (driver messages, constraint names) to clients (src/rag/main.py).GET /readyreturnsdetailstrings on failure paths for operators—typically lower risk than authenticated multi-tenant APIs but still worth generic messages if exposed broadly.
Mitigations
- Return generic messages to clients; log
excserver-side only.
POST /ingest/chunksbatch length is capped (seeMAX_INGEST_CHUNKS_PER_REQUEST);POST /retrievecapsk≤ 200.- No global rate limit in-app — add at proxy or middleware if the API is reachable by untrusted clients.
- Authentication + authorization stronger than a single optional shared secret; require tenant context where multi-tenant.
- TLS and private networking for API and database.
- Remove or restrict
/docs; generic error messages to clients (A10). - Secrets out of git; strong DB credentials; Postgres not on the public internet.
- Rate limits and request/body size limits (proxy + app).
- Dependency scanning (CI
pip-audit) + static analysis (Bandit) + image scanning on release (Trivy/similar). - Protect tuning/admin endpoints from untrusted callers.
GET /readyfor readiness only — combine with auth and network controls, not instead of them.