NexusRAG is a production-grade, multi-tenant, multi-cloud RAG agent platform. It exposes a single streaming /run endpoint backed by a stateful LangGraph agent, with enterprise-ready primitives layered on top: RBAC + ABAC authorization, SSO/SCIM provisioning, envelope encryption, SLA enforcement, cost governance, multi-region failover, and SOC 2 compliance automation.
| Feature | Status | Key Config Flag |
|---|---|---|
Streaming RAG (/v1/run, SSE) |
✅ | LLM_PROVIDER |
| Multi-cloud retrieval routing | ✅ | per-corpus provider_config_json |
| Postgres + pgvector (local) | ✅ | DATABASE_URL |
| AWS Bedrock Knowledge Bases | ✅ | corpus provider: aws_bedrock_kb |
| GCP Vertex AI Search | ✅ | corpus provider: gcp_vertex |
| API key auth + RBAC | ✅ | AUTH_ENABLED |
| ABAC policy engine | ✅ | AUTHZ_ABAC_ENABLED |
| Document ACLs | ✅ | AUTHZ_DEFAULT_DENY |
| Enterprise SSO (OIDC) | ✅ | SSO_ENABLED |
| SCIM 2.0 provisioning | ✅ | SCIM_ENABLED |
| Redis token-bucket rate limiting | ✅ | RATE_LIMIT_ENABLED |
| Idempotency keys (write endpoints) | ✅ | IDEMPOTENCY_ENABLED |
| Async document ingestion (ARQ) | ✅ | INGEST_EXECUTION_MODE |
| Cost governance + chargeback | ✅ | COST_GOVERNANCE_ENABLED |
| SLA engine + load shedding | ✅ | SLA_ENGINE_ENABLED |
| Circuit breakers (external calls) | ✅ | CB_FAILURE_THRESHOLD |
| Kill switches (per feature) | ✅ | KILL_RUN, KILL_INGEST, … |
| Envelope encryption (AES-256-GCM) | ✅ | CRYPTO_ENABLED |
| Key rotation + KMS | ✅ | CRYPTO_PROVIDER |
| Multi-region failover | ✅ | FAILOVER_ENABLED |
| Encrypted + signed backups | ✅ | BACKUP_ENABLED |
| SOC 2 compliance automation | ✅ | COMPLIANCE_ENABLED |
| DSAR / data governance | ✅ | GOVERNANCE_POLICY_ENGINE_ENABLED |
| Audit log (tamper-evident) | ✅ | always on |
| Operability alerts + incidents | ✅ | ALERTING_ENABLED |
| Autoscaling recommendations | ✅ | AUTOSCALING_ENABLED |
Prometheus metrics (/v1/metrics) |
✅ | always on |
| TTS audio output (OpenAI) | ✅ | TTS_PROVIDER=openai |
| Python + TypeScript SDKs | ✅ | make sdk-generate |
BFF endpoints (/v1/ui/*) |
✅ | always on |
┌──────────────────────────────────────────────┐
│ NexusRAG API (FastAPI) │
│ │
Client ──Bearer──▶ Auth │ Rate Limit ─▶ RBAC ─▶ ABAC ─▶ Doc ACL │
│ │ │
│ LangGraph Agent (/v1/run, SSE) │
│ / │ \ │
│ pgvector Bedrock KB Vertex AI Search │
│ \ │ / │
│ Gemini / Vertex AI │
└──────────────────────────────────────────────┘
│
┌───────────────┴───────────────┐
│ │
Postgres (pgvector) Redis
Alembic migrations Rate limits
Audit log Idempotency
Compliance evidence Circuit breakers
Encrypted blobs ARQ job queues
- Docker + Docker Compose
- Python 3.11+ (optional for local dev)
- Copy env file:
cp .env.example .env
- Start services:
docker compose up --build
- Run migrations:
docker compose exec api alembic upgrade head
Run the demo seed script inside the container:
docker compose exec api python scripts/seed_demo.py
- corpus_id:
c1 - tenant_id:
t1 - chunks: 10 demo chunks across 3 documents
- idempotency: if chunks for
c1already exist, the script no-ops and prints “already seeded”
Each corpus specifies its retrieval provider in corpora.provider_config_json:
Local pgvector:
{
"retrieval": {
"provider": "local_pgvector",
"top_k_default": 5
}
}
AWS Bedrock Knowledge Bases:
{
"retrieval": {
"provider": "aws_bedrock_kb",
"knowledge_base_id": "KB123",
"region": "us-east-1",
"top_k_default": 5
}
}
Vertex AI Search (Discovery Engine):
{
"retrieval": {
"provider": "gcp_vertex",
"project": "my-gcp-project",
"location": "us-central1",
"resource_id": "your-datastore-id",
"top_k_default": 5
}
}
Switching a corpus:
- Update
corpora.provider_config_jsonfor the target corpus. {}is accepted and normalized tolocal_pgvectorwith a defaulttop_k_defaultof 5.- Ensure cloud credentials exist at runtime (AWS/Vertex), but tests do not require live creds.
All protected endpoints require API keys via:
Authorization: Bearer <api_key>
Create a key (example):
docker compose exec api python scripts/create_api_key.py --tenant t1 --role admin --name local-admin
Export the key for curl examples:
export API_KEY=<api_key_from_script>
export ADMIN_API_KEY=$API_KEY
Revoke a key:
docker compose exec api python scripts/revoke_api_key.py <key_id>
Role matrix:
| Endpoint | reader | editor | admin |
|---|---|---|---|
/v1/run |
✅ | ✅ | ✅ |
GET /v1/documents |
✅ | ✅ | ✅ |
POST/DELETE /v1/documents, /v1/documents/*/reindex |
❌ | ✅ | ✅ |
GET /v1/corpora |
✅ | ✅ | ✅ |
PATCH /v1/corpora |
❌ | ✅ | ✅ |
/v1/ops/* |
❌ | ❌ | ✅ |
Dev-only bypass:
- Set
AUTH_DEV_BYPASS=trueto allowX-Tenant-Id+ optionalX-Role(defaults toadmin). - This is intended for local development only; keep it disabled in production.
The platform layers ABAC on top of existing RBAC and document ACLs. Decision order for document actions:
- Tenant boundary (non-bypassable).
- Kill switches / maintenance gates.
- RBAC role gate.
- Document ACL evaluation (explicit grants; expired grants ignored).
- ABAC policy evaluation (deny first, then allow; priority-aware).
- Default deny (configurable via
AUTHZ_DEFAULT_DENY).
Notes:
- Document creators receive an
ownerACL entry on create. - Admin role does not bypass document ACLs unless
AUTHZ_ADMIN_BYPASS_DOCUMENT_ACL=true. - Wildcard policies on both
resource_typeandactionrequireAUTHZ_ALLOW_WILDCARDS=true.
Policy DSL example (deny high sensitivity docs):
{
"name": "deny-high-sensitivity",
"effect": "deny",
"resource_type": "document",
"action": "read",
"priority": 200,
"condition_json": {
"eq": [{ "var": "resource.labels.sensitivity" }, "high"]
}
}Policy DSL example (allow editors during business hours):
{
"name": "allow-editors-hours",
"effect": "allow",
"resource_type": "document",
"action": "write",
"priority": 100,
"condition_json": {
"all": [
{ "eq": [{ "var": "principal.role" }, "editor"] },
{ "time_between": [{ "var": "request.time" }, { "start": "09:00", "end": "18:00" }] }
]
}
}Simulate a policy before enabling it:
curl -s -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \\
-H "Content-Type: application/json" \\
http://localhost:8000/v1/admin/authz/policies/<policy_id>/simulate \\
-d '{
"resource_type": "document",
"action": "read",
"principal": { "role": "reader" },
"resource": { "labels": { "sensitivity": "high" } }
}'Grant a document permission:
curl -s -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \\
-H "Content-Type: application/json" \\
http://localhost:8000/v1/admin/authz/documents/<document_id>/permissions \\
-d '{"principal_type": "user", "principal_id": "<user_id>", "permission": "read"}'Stable API routes are versioned under /v1 (recommended for all new clients).
Legacy unversioned routes remain as deprecated aliases and will sunset on:
- Sun, 10 May 2026 00:00:00 +0000
Legacy responses include:
Deprecation: trueSunset: <RFC 1123 date>Link: </v1/docs>; rel="successor-version"
Migration guidance:
- Prefix all routes with
/v1. - Update clients to parse
data/metaanderror/metaenvelopes. - Use
Idempotency-Keyfor write endpoints.
{
"data": ...,
"meta": {
"request_id": "...",
"api_version": "v1"
}
}
{
"error": {
"code": "STRING_CODE",
"message": "Human readable",
"details": { ... }
},
"meta": {
"request_id": "...",
"api_version": "v1"
}
}
Notes:
- SSE streams (
/v1/runstreaming) keep the existing event payloads and are not wrapped. - Legacy routes keep pre-v1 response shapes (no envelope).
Write endpoints accept Idempotency-Key (max 128 chars). Behavior:
- First request stores the response for 24h.
- Same key + same payload returns the stored response.
- Same key + different payload returns
409 IDEMPOTENCY_KEY_CONFLICT.
Audit events are stored in the audit_events table and exposed via admin-only endpoints for investigations.
Event taxonomy:
| Category | event_type | Description |
|---|---|---|
| Auth/security | auth.api_key.created |
API key created via script |
| Auth/security | auth.api_key.revoked |
API key revoked via script |
| Auth/security | auth.access.success |
API key authenticated successfully |
| Auth/security | auth.access.failure |
API key authentication failed |
| Auth/security | rbac.forbidden |
Authenticated principal lacked required role |
| Data operations | documents.ingest.enqueued |
Document ingestion enqueued |
| Data operations | documents.reindex.enqueued |
Document reindex enqueued |
| Data operations | documents.deleted |
Document deleted |
| Data operations | corpora.updated |
Corpus fields updated |
| Data operations | run.invoked |
/run invocation accepted |
| Data operations | ops.viewed |
Ops endpoints accessed |
| Security | security.rate_limited |
Request throttled by rate limiting |
| System | system.rate_limit.degraded |
Rate limiting degraded due to Redis error |
| Quota | quota.soft_cap_reached |
Soft cap threshold reached for a tenant period |
| Quota | quota.hard_cap_blocked |
Request blocked due to hard cap enforcement |
| Quota | quota.overage_observed |
Overage observed when hard cap is disabled |
| Billing | billing.usage_recorded |
Sampled usage snapshot for metering |
| Billing | billing.webhook.failure |
Billing webhook delivery failed |
| Self-serve | selfserve.api_key.created |
Tenant admin created an API key |
| Self-serve | selfserve.api_key.revoked |
Tenant admin revoked an API key |
| Self-serve | selfserve.api_key.listed |
Tenant admin listed API keys |
| Self-serve | selfserve.usage.viewed |
Tenant admin viewed usage dashboards |
| Self-serve | selfserve.plan.viewed |
Tenant admin viewed plan details |
| Self-serve | selfserve.billing_webhook_tested |
Tenant admin tested billing webhook delivery |
| Plans | plan.upgrade_requested |
Tenant requested a plan upgrade |
| System | system.worker.heartbeat.missed |
Optional: worker heartbeat missing |
| System | system.error |
Optional: handled internal error |
Redaction policy:
- Never store plaintext API keys, Authorization headers, full user message content, or raw document text.
- Keys matching
api_key,authorization,token,secret,password,text, orcontentare stored as[REDACTED]. - Store only identifiers, counts, and high-level metadata.
List events:
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/v1/audit/events?limit=50"
Filter events:
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/v1/audit/events?event_type=rbac.forbidden&outcome=failure&limit=20"
UI-focused endpoints live under /v1/ui/* and return normalized shapes for web apps:
GET /v1/ui/bootstrapGET /v1/ui/dashboard/summaryGET /v1/ui/documentsGET /v1/ui/activityPOST /v1/ui/actions/reindex-document
Query conventions:
qfull-text filtersortcomma list (e.g.,-created_at,name)limit1..100 (default 25)cursoropaque token (preferred)- filters:
status,corpus_id,created_from,created_to,actor_type,event_type
Pagination response shape:
{
"data": {
"items": [...],
"page": { "next_cursor": "...", "has_more": true },
"facets": { "status": [{"value":"succeeded","count":12}] }
},
"meta": { "request_id": "...", "api_version": "v1" }
}
Invalid cursors return 400 with INVALID_CURSOR.
Optimistic action response shape:
{
"data": {
"action_id": "...",
"status": "accepted",
"accepted_at": "...",
"optimistic": { "entity": "document", "id": "...", "patch": { "status": "queued" } },
"poll_url": "/v1/documents/<id>"
},
"meta": { "request_id": "...", "api_version": "v1" }
}
SSE protocol for /v1/run:
- Order:
request.accepted→token.delta*→message.final→audio.ready|audio.error→done - Every event payload includes
seq - Heartbeat:
event: heartbeat
data: {"ts":"...","request_id":"...","seq":7}
- Reconnects send
Last-Event-ID; server responds withresume.unsupported(restart required).
Rate limits use a Redis-backed token bucket with sustained rate + burst capacity. Limits are enforced per API key and per tenant; requests are allowed only when both buckets have capacity.
Clients should respect Retry-After and X-RateLimit-Retry-After-Ms headers and apply exponential backoff on 429/503 responses (SDKs include retry helpers).
Route classes:
| Class | Scope | Paths |
|---|---|---|
| run | strict | POST /run |
| mutation | write | POST/PATCH/DELETE /documents, PATCH /corpora, /admin/quotas/*, audit write endpoints (if added) |
| read | read | GET endpoints (except ops/audit events) |
| ops | ops | /ops/* and /audit/events* |
Default thresholds:
| Route class | Key RPS | Key burst | Tenant RPS | Tenant burst |
|---|---|---|---|---|
| run | 1 | 5 | 3 | 15 |
| mutation | 2 | 10 | 5 | 25 |
| read | 5 | 20 | 15 | 60 |
| ops | 2 | 10 | 4 | 20 |
Fail behavior:
RL_FAIL_MODE=open(default): allow traffic if Redis is unavailable and setX-RateLimit-Status: degraded.RL_FAIL_MODE=closed: return503 RATE_LIMIT_UNAVAILABLE.
Example 429 response:
HTTP/1.1 429 Too Many Requests
Retry-After: 2
X-RateLimit-Scope: api_key
X-RateLimit-Route-Class: run
X-RateLimit-Retry-After-Ms: 1200
{
"detail": {
"code": "RATE_LIMITED",
"message": "Rate limit exceeded",
"scope": "api_key",
"route_class": "run",
"retry_after_ms": 1200
}
}
List corpora:
curl -s -H "Authorization: Bearer $API_KEY" http://localhost:8000/v1/corpora
Get a corpus:
curl -s -H "Authorization: Bearer $API_KEY" http://localhost:8000/v1/corpora/c1
Patch provider_config_json:
curl -s -X PATCH -H "Content-Type: application/json" -H "Authorization: Bearer $API_KEY" \
http://localhost:8000/v1/corpora/c1 \
-d '{
"provider_config_json": {
"retrieval": {
"provider": "aws_bedrock_kb",
"knowledge_base_id": "KB123",
"region": "us-east-1",
"top_k_default": 5
}
}
}'
Enable TTS with environment variables:
TTS_PROVIDER=openai|fake|none(defaultnone)OPENAI_API_KEY(required for OpenAI)OPENAI_TTS_MODEL(defaultgpt-4o-mini-tts)OPENAI_TTS_VOICE(defaultalloy)AUDIO_BASE_URL(defaulthttp://localhost:8000)- Set to
http://localhost:8000/v1to emit versioned audio URLs.
- Set to
Audio files are stored locally under var/audio/ (dev-only).
Example /run with audio enabled:
curl -N -H "Content-Type: application/json" -H "Authorization: Bearer $API_KEY" \
-X POST http://localhost:8000/v1/run \
-d '{
"session_id":"s-audio-1",
"corpus_id":"c1",
"message":"Summarize the demo corpus.",
"top_k":5,
"audio":true
}'
SSE events:
audio.ready→{"type":"audio.ready","request_id":"...","data":{"audio_url":"http://localhost:8000/v1/audio/<id>.mp3","audio_id":"<id>","mime":"audio/mpeg"}}audio.error→{"type":"audio.error","request_id":"...","data":{"code":"TTS_ERROR","message":"..."}}
Supported types: text/plain, text/markdown (JSON with {"text": "..."} is accepted as a file upload).
Ingestion is async: the API enqueues a Redis-backed job and returns 202 Accepted.
Lifecycle: queued → processing → succeeded|failed (see failure_reason on failures).
Raw text ingestion is deterministic and idempotent when a document_id is supplied.
Upload a document (returns 202 with job_id + status_url):
curl -s -X POST -H "Authorization: Bearer $API_KEY" \
-F "corpus_id=c1" \
-F "file=@./example.txt;type=text/plain" \
http://localhost:8000/v1/documents
Ingest raw text (returns 202 with job_id + status_url):
Set `overwrite: true` to requeue a failed or succeeded document with the same `document_id`.
curl -s -X POST -H "Content-Type: application/json" -H "Authorization: Bearer $API_KEY" \
http://localhost:8000/v1/documents/text \
-d '{
"corpus_id": "c1",
"text": "Some raw text to ingest.",
"document_id": "doc-123",
"filename": "notes.txt",
"overwrite": false
}'
Check status / poll:
curl -s -H "Authorization: Bearer $API_KEY" http://localhost:8000/v1/documents/<document_id>
List documents:
curl -s -H "Authorization: Bearer $API_KEY" http://localhost:8000/v1/documents
Reindex a document (returns 202 with job_id + status_url):
curl -s -X POST -H "Content-Type: application/json" -H "Authorization: Bearer $API_KEY" \
http://localhost:8000/v1/documents/<document_id>/reindex \
-d '{
"chunk_size_chars": 1200,
"chunk_overlap_chars": 150
}'
Delete a document:
curl -s -X DELETE -H "Authorization: Bearer $API_KEY" http://localhost:8000/v1/documents/<document_id>
Troubleshooting:
- If status stays
queued, ensure the ingestion worker is running and Redis is reachable. - If status stays
processing, check for queue backlog or long-running ingestions. - If status is
failed, inspectfailure_reasonand worker logs, then reindex or re-upload. - Delete returns
409for in-flight documents (queued/processing).
Ops endpoints return 200 even when dependencies are degraded, surfacing the degraded field instead of failing.
Ops endpoints require an admin API key.
Health summary:
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" http://localhost:8000/v1/ops/health
Ingestion stats (last 24 hours by default):
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" "http://localhost:8000/v1/ops/ingestion?hours=24"
Metrics snapshot (JSON):
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" http://localhost:8000/v1/ops/metrics
Heartbeat interpretation:
worker_heartbeat_age_sshows seconds since the last worker heartbeat.- If the heartbeat is missing or stale,
/ops/healthreportsstatus: degraded.
Queue depth:
queue_depthreports pending jobs in the Redis ingestion queue.- If Redis is unavailable,
queue_depthisnulland/ops/healthreportsredis: degraded.
Reliability controls are centralized and configurable:
EXT_CALL_TIMEOUT_MS/EXT_RETRY_*for external integrations (retrieval, TTS, billing webhooks)- Circuit breakers per integration (shared via Redis)
- Bulkheads:
RUN_MAX_CONCURRENCY,INGEST_MAX_CONCURRENCY
On saturation, /v1/run and ingestion endpoints return 503 with SERVICE_BUSY.
Fetch current SLO status:
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/v1/ops/slo"
Example response:
{
"data":{
"window":"1h",
"availability":99.95,
"p95":{"run":2500,"api":620,"documents_text":480},
"error_budget":{"remaining_pct":87.2,"burn_rate_5m":1.4},
"status":"healthy"
},
"meta":{...}
}
Kill switches and canary percentages are managed under /v1/admin/rollouts/*.
Kill switch patch:
curl -s -X PATCH -H "Authorization: Bearer $ADMIN_API_KEY" \
-H "Content-Type: application/json" \
-d '{"kill_switches":{"kill.run":true}}' \
"http://localhost:8000/v1/admin/rollouts/killswitches"
Canary patch:
curl -s -X PATCH -H "Authorization: Bearer $ADMIN_API_KEY" \
-H "Content-Type: application/json" \
-d '{"canary_percentages":{"rollout.tts":5}}' \
"http://localhost:8000/v1/admin/rollouts/canary"
Admin maintenance endpoint:
curl -s -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/v1/admin/maintenance/run?task=prune_idempotency"
Available maintenance tasks:
prune_idempotencyprune_auditcleanup_actionsprune_usagebackup_create_scheduledbackup_prune_retentionrestore_drill_scheduledcompliance_evaluate_scheduledcompliance_bundle_periodiccompliance_prune_old_evidence
Retention knobs:
AUDIT_RETENTION_DAYSUI_ACTION_RETENTION_DAYSUSAGE_COUNTER_RETENTION_DAYSBACKUP_RETENTION_DAYS
Runbooks live under docs/runbooks/:
incident-response.mdbreaker-playbook.mdrollout-playbook.mddr-backup-restore.mdrestore-drill-checklist.mdkey-rotation-for-backups.mdfailover-execution.mdfailover-rollback.mdsplit-brain-mitigation.mddsar-handling.mdlegal-hold-procedure.mdretention-and-anonymization.mdaudit-evidence-export.mdkey-rotation-execution.mdkey-compromise-response.mdkms-outage-procedure.mdencrypted-artifact-access.mdsoc2-audit-prep.mdcompliance-control-failure-response.mdevidence-bundle-verification.mdcompliance-scheduling-and-retention.md
Backups include a database logical dump, schema-only dump, and a metadata snapshot.
Backup configuration:
BACKUP_ENABLED,BACKUP_LOCAL_DIRBACKUP_ENCRYPTION_ENABLED,BACKUP_ENCRYPTION_KEYBACKUP_SIGNING_ENABLED,BACKUP_SIGNING_KEYBACKUP_RETENTION_DAYS,BACKUP_SCHEDULE_CRONRESTORE_REQUIRE_SIGNATURE
Create a backup:
docker compose exec api python scripts/backup_create.py --type all
Restore (dry-run):
docker compose exec api python scripts/backup_restore.py \
--manifest ./backups/<job>/manifest.json \
--components all \
--dry-run
Restore (destructive requires explicit confirmation):
docker compose exec api python scripts/backup_restore.py \
--manifest ./backups/<job>/manifest.json \
--components db,schema \
--allow-destructive
Check DR readiness:
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/v1/ops/dr/readiness"
Failover control plane is region-aware and token-gated to reduce accidental promotions.
Key settings:
REGION_ID,REGION_ROLEFAILOVER_ENABLED,FAILOVER_MODEREPLICATION_LAG_MAX_SECONDS,REPLICATION_HEALTH_REQUIREDWRITE_FREEZE_ON_UNHEALTHY_REPLICAFAILOVER_COOLDOWN_SECONDS,FAILOVER_TOKEN_TTL_SECONDSPEER_REGIONS_JSON
Failover states:
idlefreeze_writesprecheckpromotingverificationcompletedfailedrollback_pendingrolled_back
Safety invariants:
- one failover at a time (Redis lock + DB row lock)
- cooldown enforced between transitions
- promotion/rollback requires one-time short-lived token
- writes freeze when region is not active primary or freeze flag is enabled
Get failover status:
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/v1/ops/failover/status"
Check failover readiness:
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/v1/ops/failover/readiness"
Request promotion token:
curl -s -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
-H "Content-Type: application/json" \
-d '{"purpose":"promote","reason":"primary unavailable"}' \
"http://localhost:8000/v1/ops/failover/request-token"
Promote with token:
curl -s -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
-H "Content-Type: application/json" \
-d '{"target_region":"ap-southeast-1","token":"<TOKEN>","reason":"incident failover","force":false}' \
"http://localhost:8000/v1/ops/failover/promote"
Toggle write freeze:
curl -s -X PATCH -H "Authorization: Bearer $ADMIN_API_KEY" \
-H "Content-Type: application/json" \
-d '{"freeze":true,"reason":"replication degraded"}' \
"http://localhost:8000/v1/ops/failover/freeze-writes"
Governance controls add tenant-scoped retention, legal hold, DSAR execution, and policy-as-code decisions.
messages_ttl_dayscheckpoints_ttl_daysaudit_ttl_daysdocuments_ttl_daysbackups_ttl_dayshard_delete_enabledanonymize_instead_of_delete
Read/update retention policy:
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/v1/admin/governance/retention/policy"
curl -s -X PATCH -H "Authorization: Bearer $ADMIN_API_KEY" \
-H "Content-Type: application/json" \
-d '{"documents_ttl_days":30,"hard_delete_enabled":false,"anonymize_instead_of_delete":true}' \
"http://localhost:8000/v1/admin/governance/retention/policy"
Run retention and fetch report:
curl -s -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/v1/admin/governance/retention/run?tenant_id=t1"
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/v1/admin/governance/retention/report?tenant_id=t1&run_id=1"
- Active legal holds supersede retention deletion and DSAR destructive requests.
- Holds can be scoped to
tenant,document,session,user_key, orbackup_set.
Create/release legal hold:
curl -s -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
-H "Content-Type: application/json" \
-d '{"scope_type":"document","scope_id":"doc_123","reason":"Litigation case #123"}' \
"http://localhost:8000/v1/admin/governance/legal-holds"
curl -s -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/v1/admin/governance/legal-holds/1/release"
- Create request:
POST /v1/admin/governance/dsar - Poll request:
GET /v1/admin/governance/dsar/{id} - Download export artifact:
GET /v1/admin/governance/dsar/{id}/artifact
DSAR export example:
curl -s -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
-H "Content-Type: application/json" \
-d '{"request_type":"export","subject_type":"session","subject_id":"s1","reason":"Data subject request"}' \
"http://localhost:8000/v1/admin/governance/dsar"
Policies evaluate by rule_key with descending priority and deterministic tie-break by rule id.
Policy rule example:
{
"rule_key": "documents.delete",
"enabled": true,
"priority": 1000,
"condition_json": {"method": "DELETE", "endpoint_prefix": "/v1/documents/"},
"action_json": {"type": "deny", "code": "POLICY_DENIED", "message": "Delete disabled by policy"}
}
Create/list policies:
curl -s -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
-H "Content-Type: application/json" \
-d '{"rule_key":"documents.delete","priority":1000,"condition_json":{"method":"DELETE"},"action_json":{"type":"deny","code":"POLICY_DENIED"}}' \
"http://localhost:8000/v1/admin/governance/policies"
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/v1/admin/governance/policies"
Status:
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/v1/ops/governance/status"
Evidence bundle metadata:
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/v1/ops/governance/evidence?window_days=30"
POLICY_DENIED(403)LEGAL_HOLD_ACTIVE(409)DSAR_REQUIRES_APPROVAL(409)DSAR_NOT_FOUND(404)GOVERNANCE_REPORT_UNAVAILABLE(503)
Sensitive artifacts are protected with envelope encryption (AES-256-GCM) using per-tenant key versions.
CRYPTO_ENABLED,CRYPTO_PROVIDERCRYPTO_REQUIRE_ENCRYPTION_FOR_SENSITIVECRYPTO_DEFAULT_KEY_ALIAS,CRYPTO_ROTATION_INTERVAL_DAYSCRYPTO_FAIL_MODE(open|closed)
Local provider:
CRYPTO_PROVIDER=local_kmsCRYPTO_LOCAL_MASTER_KEY(base64/hex)
List keys:
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/v1/admin/crypto/keys/{tenant_id}"
Rotate and re-encrypt:
curl -s -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
-H "Content-Type: application/json" \
-d '{"reason":"scheduled rotation","reencrypt":true,"force":false}' \
"http://localhost:8000/v1/admin/crypto/keys/{tenant_id}/rotate"
Rotation job status:
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/v1/admin/crypto/rotation-jobs/{job_id}"
ENCRYPTION_REQUIRED(503)KMS_UNAVAILABLE(503)KEY_ROTATION_IN_PROGRESS(409)KEY_ROTATION_FAILED(500)KEY_NOT_ACTIVE(409)DECRYPTION_FAILED(500)CRYPTO_POLICY_DENIED(403)
SOC 2 controls are evaluated continuously and bundled as signed evidence for auditors.
Baseline controls (automated/hybrid):
CC6.1Access control enforcementCC6.2API key governanceCC7.1Change management evidenceCC7.2Monitoring and incident response readinessCC8.1Vulnerability/patch cadence (manual artifact upload)A1.1Backup + restore drill complianceC1.1Encryption posture
Evaluate controls:
curl -s -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
-H "Content-Type: application/json" \
-d '{"window_days":30}' \
"http://localhost:8000/v1/admin/compliance/evaluate"
Generate bundle:
curl -s -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
-H "Content-Type: application/json" \
-d '{"bundle_type":"soc2_on_demand","period_start":"2026-01-01T00:00:00Z","period_end":"2026-01-31T23:59:59Z"}' \
"http://localhost:8000/v1/admin/compliance/bundles"
CLI bundle generation:
docker compose exec api python scripts/compliance_generate_bundle.py \
--bundle-type soc2_on_demand \
--period-start 2026-01-01T00:00:00Z \
--period-end 2026-01-31T23:59:59Z
Verify bundle:
curl -s -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/v1/admin/compliance/bundles/{id}/verify"
Upload dependency scan artifact:
curl -s -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
-H "Content-Type: application/json" \
-d '{"artifact_type":"dependency_scan","artifact_uri":"s3://compliance/scans/scan.json"}' \
"http://localhost:8000/v1/admin/compliance/artifacts"
Ops posture:
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/v1/ops/compliance/status"
Compliance settings:
COMPLIANCE_ENABLEDCOMPLIANCE_DEFAULT_WINDOW_DAYSCOMPLIANCE_EVAL_CRON,COMPLIANCE_BUNDLE_CRONCOMPLIANCE_EVIDENCE_RETENTION_DAYSCOMPLIANCE_SIGNATURE_REQUIREDCOMPLIANCE_EVIDENCE_DIR
COMPLIANCE_CONTROL_NOT_FOUND(404)COMPLIANCE_EVALUATION_FAILED(500)COMPLIANCE_BUNDLE_BUILD_FAILED(500)COMPLIANCE_BUNDLE_VERIFY_FAILED(400)COMPLIANCE_DISABLED(503)
Use the smoke script to validate retrieval routing without calling the LLM:
docker compose exec api python scripts/provider_smoke.py \
--tenant t1 --corpus c1 --query "test query" --top-k 5
Required environment (examples):
AWS_REGION(orAWS_DEFAULT_REGION)AWS_PROFILE(orAWS_ACCESS_KEY_ID+AWS_SECRET_ACCESS_KEY+ optionalAWS_SESSION_TOKEN)
Permissions (high level):
bedrock:Retrieveon the target knowledge base
Smoke validation:
docker compose exec api python scripts/provider_smoke.py \
--tenant t1 --corpus c1 --query "bedrock test" --top-k 5
Required environment (examples):
GOOGLE_CLOUD_PROJECTGOOGLE_CLOUD_LOCATION(orVERTEX_LOCATION)- Application Default Credentials (ADC), e.g.
gcloud auth application-default login
Smoke validation:
docker compose exec api python scripts/provider_smoke.py \
--tenant t1 --corpus c1 --query "vertex test" --top-k 5
AWS_CONFIG_MISSING: required AWS env vars or KB config missingAWS_AUTH_ERROR: missing/invalid AWS credentialsAWS_RETRIEVAL_ERROR: Bedrock retrieval failed (check permissions or KB id)VERTEX_RETRIEVAL_CONFIG_MISSING: missing Vertex config in corpus or envVERTEX_RETRIEVAL_AUTH_ERROR: missing/invalid Google ADC credentialsVERTEX_RETRIEVAL_ERROR: Vertex retrieval failed (check resource id)
curl -N -H "Content-Type: application/json" -H "Authorization: Bearer $API_KEY" \
-X POST http://localhost:8000/v1/run \
-d '{
"session_id":"s1",
"corpus_id":"c1",
"message":"What is the testing strategy of agent 2.0?",
"top_k":5,
"audio":false
}'
If you receive 429 RATE_LIMITED, back off using the Retry-After header and retry.
Usage quotas enforce daily and monthly request caps per tenant. Soft caps emit warnings; hard caps can block or observe overages.
Quota behavior:
- Soft cap (default 80% of limit): request allowed + warning header + event emitted once per period.
- Hard cap: block with
402 QUOTA_EXCEEDED(whenhard_cap_enabled=true), or allow with overage event (whenhard_cap_enabled=false). /runcounts as 3 request units to reflect higher cost.
Quota headers (always included on successful requests):
X-Quota-Day-Limit,X-Quota-Day-Used,X-Quota-Day-RemainingX-Quota-Month-Limit,X-Quota-Month-Used,X-Quota-Month-RemainingX-Quota-SoftCap-Reached:true|falseX-Quota-HardCap-Mode:enforce|observe
Example 402 response:
HTTP/1.1 402 Payment Required
{
"detail": {
"code": "QUOTA_EXCEEDED",
"message": "Monthly request quota exceeded",
"period": "month",
"limit": 10000,
"used": 10000,
"remaining": 0
}
}
Admin quota endpoints (admin role only, tenant-scoped):
# Get limits
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
http://localhost:8000/v1/admin/quotas/t1
# Update limits
curl -s -X PATCH -H "Content-Type: application/json" -H "Authorization: Bearer $ADMIN_API_KEY" \
http://localhost:8000/v1/admin/quotas/t1 \
-d '{
"daily_requests_limit": 500,
"monthly_requests_limit": 10000,
"soft_cap_ratio": 0.8,
"hard_cap_enabled": true
}'
# Usage summary
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/v1/admin/usage/t1?period=month&start=2026-02-01"
Billing webhook configuration:
BILLING_WEBHOOK_ENABLED=trueBILLING_WEBHOOK_URL=https://billing.example.com/hooksBILLING_WEBHOOK_SECRET=...BILLING_WEBHOOK_TIMEOUT_MS=2000
Webhook signature:
- Header
X-Billing-Signatureishex(HMAC_SHA256(secret, raw_body)). - Header
X-Billing-Eventincludes the event type (e.g.,quota.soft_cap_reached).
Plans assign feature entitlements per tenant. Entitlements are enforced server-side for retrieval providers, TTS, ops/audit access, and provider configuration changes.
Plan matrix:
| Feature | Free | Pro | Enterprise |
|---|---|---|---|
| Local pgvector retrieval | yes | yes | yes |
| AWS Bedrock KB retrieval | no | no | yes |
| GCP Vertex retrieval | no | yes | yes |
| Text-to-speech (TTS) | no | yes | yes |
| Ops admin access | no | yes | yes |
| Audit access | no | yes | yes |
| Corpora provider config patch | no | yes | yes |
| Billing webhook test | no | yes | yes |
| High quota tier | no | no | yes |
| Enterprise SSO (OIDC) | no | yes | yes |
| SCIM 2.0 provisioning | no | no | yes |
| JIT provisioning | no | no | yes |
| Cost visibility | no | yes | yes |
| Cost controls | no | yes | yes |
| Chargeback reports | no | no | yes |
Entitlement enforcement:
- Retrieval provider selection is validated against
feature.retrieval.*flags. /runwithaudio=truerequiresfeature.tts./ops/*and/audit/events*requirefeature.ops_admin_accessandfeature.audit_access.PATCH /corpora/{id}with provider changes requiresfeature.corpora_patch_provider_config.- SSO endpoints require
feature.identity.ssoandSSO_ENABLED=true. - SCIM endpoints require
feature.identity.scimandSCIM_ENABLED=true. - JIT provisioning requires
feature.identity.jitandjit_enabled=trueon the provider. - Cost endpoints require
feature.cost_visibility,feature.cost_controls, andfeature.chargeback_reportsas applicable.
Feature disabled error:
HTTP/1.1 403 Forbidden
{
"error": {
"code": "FEATURE_NOT_ENABLED",
"message": "Feature not enabled for tenant plan",
"details": {
"feature_key": "feature.tts"
}
},
"meta": {
"request_id": "req_example",
"api_version": "v1"
}
}
Admin plan endpoints (admin role only, tenant-scoped):
# List plans
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
http://localhost:8000/v1/admin/plans
# Get tenant plan
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
http://localhost:8000/v1/admin/plans/t1
# Assign plan
curl -s -X PATCH -H "Content-Type: application/json" -H "Authorization: Bearer $ADMIN_API_KEY" \
http://localhost:8000/v1/admin/plans/t1 \
-d '{"plan_id":"pro"}'
# Override a feature
curl -s -X PATCH -H "Content-Type: application/json" -H "Authorization: Bearer $ADMIN_API_KEY" \
http://localhost:8000/v1/admin/plans/t1/overrides \
-d '{"feature_key":"feature.tts","enabled":true,"config_json":{"voices":["nova"]}}'
Enterprise identity features add OIDC-based SSO, SCIM 2.0 provisioning, and JIT user creation.
Key settings:
SSO_ENABLED=trueSSO_ALLOWED_REDIRECT_HOSTS=app.example.com,admin.example.comSSO_STATE_TTL_SECONDS=600SSO_NONCE_TTL_SECONDS=600SSO_CLOCK_SKEW_SECONDS=120SSO_SESSION_TTL_HOURS=8SSO_PUBLIC_DISCOVERY_ENABLED=falseSCIM_ENABLED=trueSCIM_TOKEN_TTL_DAYS=365SCIM_DEFAULT_PAGE_SIZE=50SCIM_MAX_PAGE_SIZE=200
OIDC provider setup (admin, tenant-scoped):
curl -s -X POST -H "Content-Type: application/json" -H "Authorization: Bearer $ADMIN_API_KEY" \
http://localhost:8000/v1/admin/identity/providers \
-d '{
"type": "oidc",
"name": "Okta",
"issuer": "https://example.okta.com/oauth2/default",
"client_id": "0oa123",
"client_secret_ref": "OKTA_OIDC_CLIENT_SECRET",
"auth_url": "https://example.okta.com/oauth2/default/v1/authorize",
"token_url": "https://example.okta.com/oauth2/default/v1/token",
"jwks_url": "https://example.okta.com/oauth2/default/v1/keys",
"scopes_json": ["openid", "profile", "email", "groups"],
"default_role": "reader",
"role_mapping_json": {"groups": {"Admins": "admin", "Editors": "editor"}},
"jit_enabled": true
}'
curl -s -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
http://localhost:8000/v1/admin/identity/providers/$PROVIDER_ID/enable
SSO flow (dev mode returns JSON, prod typically uses redirects):
# Start the OIDC flow (returns authorize_url or redirects if response_mode=redirect)
curl -s "http://localhost:8000/v1/auth/sso/oidc/$PROVIDER_ID/start"
# Callback (normally invoked by IdP)
curl -s "http://localhost:8000/v1/auth/sso/oidc/$PROVIDER_ID/callback?code=AUTH_CODE&state=STATE"
SCIM provisioning (token-based):
# Create a SCIM token
curl -s -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
http://localhost:8000/v1/admin/identity/scim/token/create
# Provision a user
curl -s -X POST -H "Authorization: Bearer $SCIM_TOKEN" \
http://localhost:8000/v1/scim/v2/Users \
-d '{
"schemas": ["urn:ietf:params:scim:schemas:core:2.0:User"],
"userName": "user@example.com",
"displayName": "User Example",
"emails": [{"value": "user@example.com", "primary": true}],
"active": true
}'
Role mapping examples:
- Claim-based:
{"groups": {"Admins": "admin", "Editors": "editor"}} - Single claim:
{"claim": "roles", "mapping": {"staff": "reader"}}
Token rotation guidance:
- Create a new SCIM token with
/v1/admin/identity/scim/token/create. - Update the IdP provisioning connector to use the new token.
- Revoke the old token with
/v1/admin/identity/scim/token/revoke.
Security notes:
- Client secrets are referenced via
client_secret_ref; plaintext secrets are never stored. - ID tokens, access tokens, and SCIM bearer tokens are never logged or persisted.
- Callback URLs must be HTTPS in non-dev environments.
- State and nonce values are stored in Redis with TTLs to prevent replay.
Cost governance provides request-level metering, tenant budgets, and chargeback reporting with runtime guardrails.
Key settings:
COST_GOVERNANCE_ENABLED=trueCOST_DEFAULT_WARN_RATIO=0.8COST_DEFAULT_HARD_CAP_MODE=blockCOST_DEGRADE_ENABLE_TTS_DISABLE=trueCOST_DEGRADE_MIN_TOP_K=3COST_DEGRADE_MAX_OUTPUT_TOKENS=512COST_ESTIMATOR_ENABLED=trueCOST_ESTIMATOR_TOKEN_CHARS_RATIO=4.0COST_TIMESERIES_DEFAULT_DAYS=30
Budget model and modes:
warn_ratiotriggersX-Cost-Status: warnheaders andcost.warnSSE events without blocking.enforce_hard_cap=trueplushard_cap_mode=block|degradecontrols whether requests are blocked or downgraded.- Degrade mode disables audio first, reduces
top_k, forces local retrieval, and shortens max output tokens.
Cost headers on successful responses:
X-Cost-Month-Budget-UsdX-Cost-Month-Spend-UsdX-Cost-Month-Remaining-UsdX-Cost-Status(ok|warn|capped|degraded)X-Cost-Estimated(true|false)
Pricing catalog (admin, tenant-scoped):
curl -s -X POST -H "Content-Type: application/json" -H "Authorization: Bearer $ADMIN_API_KEY" \
http://localhost:8000/v1/admin/costs/pricing/catalog \
-d '{
"version": "v2026.02",
"provider": "internal",
"component": "llm",
"rate_type": "per_1k_tokens",
"rate_value_usd": 0.0025,
"effective_from": "2026-02-01T00:00:00Z",
"active": true
}'
Budget configuration (admin):
curl -s -X PATCH -H "Content-Type: application/json" -H "Authorization: Bearer $ADMIN_API_KEY" \
http://localhost:8000/v1/admin/costs/budget \
-d '{"monthly_budget_usd":500,"warn_ratio":0.8,"enforce_hard_cap":true,"hard_cap_mode":"degrade"}'
Chargeback reports:
curl -s -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/v1/admin/costs/chargeback/generate?period_start=2026-02-01T00:00:00Z&period_end=2026-03-01T00:00:00Z"
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
http://localhost:8000/v1/admin/costs/chargeback/reports
Self-serve dashboards:
GET /v1/self-serve/costs/budgetGET /v1/self-serve/costs/spend/summaryGET /v1/self-serve/costs/spend/timeseriesGET /v1/self-serve/costs/spend/breakdownGET /v1/self-serve/costs/chargeback/latest
Security notes:
- Cost metadata is redacted to counts/identifiers; raw prompts and document content are never stored.
- Metering is best-effort and never blocks requests if cost writes fail.
Phase 34 adds a tenant-scoped SLA control plane with policy evaluation, runtime enforcement, incidents, and autoscaling recommendations.
Key settings:
SLA_ENGINE_ENABLED=trueSLA_DEFAULT_ENFORCEMENT_MODE=observeSLA_DEFAULT_BREACH_WINDOWS=3SLA_MEASUREMENT_WINDOW_SECONDS=60SLA_ERROR_BUDGET_WINDOW_MINUTES=60SLA_SHED_ENABLED=trueSLA_DEGRADE_TTS_DISABLE=trueSLA_DEGRADE_TOP_K_FLOOR=3SLA_DEGRADE_MAX_OUTPUT_TOKENS=512AUTOSCALING_ENABLED=trueAUTOSCALING_DRY_RUN=trueAUTOSCALING_HYSTERESIS_PCT=10AUTOSCALING_EXECUTOR=noop
Decision order:
measurements/signals -> policy evaluator -> status(healthy|warning|breached)
-> enforcement decision(allow|warn|degrade|shed) -> runtime headers/SSE/audit
Runtime effects:
observe: evaluate only; request continues.warn: request continues withX-SLA-*headers andsla.warnSSE.degrade: request continues with mitigations (disable audio, lowertop_k, cap output tokens, provider fallback) andsla.degrade.applied.shed: request is rejected with503andSLA_SHED_LOAD, plussla.shedSSE.
SLA headers:
X-SLA-Status(healthy|warning|breached)X-SLA-Policy-IdX-SLA-Decision(allow|warn|degrade|shed)X-SLA-Route-ClassX-SLA-Window-End
Create policy example:
curl -s -X POST -H "Content-Type: application/json" -H "Authorization: Bearer $ADMIN_API_KEY" \
http://localhost:8000/v1/admin/sla/policies \
-d '{
"name":"enterprise-run-sla",
"tier":"enterprise",
"enabled":true,
"version":1,
"config_json":{
"objectives":{
"availability_min_pct":99.9,
"p95_ms_max":{"run":1200},
"p99_ms_max":{"run":2000},
"max_error_budget_burn_5m":2.0
},
"enforcement":{
"mode":"enforce",
"breach_window_minutes":5,
"consecutive_windows_to_trigger":3
},
"mitigation":{
"allow_degrade":true,
"disable_tts_first":true,
"reduce_top_k_floor":3,
"cap_output_tokens":512,
"provider_fallback_order":["local_pgvector"]
},
"autoscaling_link":{"profile_id":null}
}
}'Dry-run autoscaling recommendation:
curl -s -X POST -H "Content-Type: application/json" -H "Authorization: Bearer $ADMIN_API_KEY" \
http://localhost:8000/v1/admin/sla/autoscaling/evaluate \
-d '{
"profile_id":"profile_123",
"route_class":"run",
"current_replicas":2,
"p95_ms":1650,
"queue_depth":24,
"signal_quality":"ok"
}'Phase 37 adds deterministic alert-rule evaluation, incident automation, and operator response workflows.
Key settings:
ALERTING_ENABLED=trueOPERABILITY_BACKGROUND_EVALUATOR_ENABLED=trueOPERABILITY_EVAL_INTERVAL_S=30INCIDENT_AUTOMATION_ENABLED=trueINCIDENT_AUTO_OPEN_MIN_SEVERITY=highNOTIFY_WEBHOOK_URLS_JSON=["https://.../hook"](global fallback only)NOTIFY_MAX_ATTEMPTS=5NOTIFY_BACKOFF_MS=500NOTIFY_BACKOFF_MAX_MS=15000NOTIFY_MAX_AGE_SECONDS=86400NOTIFY_DEDUPE_WINDOW_S=300NOTIFY_QUEUE_NAME=notificationsOPS_FORCED_FLAG_TTL_S=900OPS_FORCED_WRITER_LEASE_TTL_S=30
Worker services:
docker compose up -d operability_worker notification_worker notify_receiver
docker compose logs --tail=200 operability_worker
docker compose logs --tail=200 notification_worker
docker compose logs --tail=200 notify_receiverList and tune alert rules:
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
http://localhost:8000/v1/admin/alerts/rules
curl -s -X PATCH -H "Authorization: Bearer $ADMIN_API_KEY" -H "Content-Type: application/json" \
http://localhost:8000/v1/admin/alerts/rules/$RULE_ID \
-d '{"enabled":true,"thresholds_json":{"value":2.0}}'Evaluate alerts and inspect triggered rows:
curl -s -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/v1/admin/alerts/evaluate?window=5m"Incident lifecycle:
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/v1/admin/incidents?status=open"
curl -s -X POST -H "Authorization: Bearer $ADMIN_API_KEY" -H "Content-Type: application/json" \
http://localhost:8000/v1/admin/incidents/$INCIDENT_ID/ack \
-d '{"note":"Acknowledged by on-call"}'
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
http://localhost:8000/v1/admin/incidents/$INCIDENT_ID/timelineNotification queue triage:
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/v1/admin/notifications/jobs?status=retrying"
curl -s -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
http://localhost:8000/v1/admin/notifications/jobs/$JOB_ID/retry-now
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/v1/admin/notifications/jobs/$JOB_ID/attempts"Tenant notification destinations:
curl -s -X POST -H "Authorization: Bearer $ADMIN_API_KEY" -H "Content-Type: application/json" \
http://localhost:8000/v1/admin/notifications/destinations \
-d '{
"tenant_id":"t1",
"url":"https://example.com/webhook",
"headers_json":{"X-Env":"prod"},
"secret":"replace-with-shared-secret"
}'
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/v1/admin/notifications/destinations?tenant_id=t1"Notification routing policies:
- Route matching fields:
event_type,severity, optionalsource, optionalcategory. - Matching supports exact values, arrays, and wildcard
*forevent_type. - Matching routes are evaluated by
priority(lower value first), then route destination order. - If no route matches: fallback is tenant enabled destinations, then
NOTIFY_WEBHOOK_URLS_JSON. - Delivery is at-least-once; receivers should dedupe using
X-Notification-Id.
Notification delivery contract:
X-Notification-Id: stable per job id.X-Notification-Attempt: 1-based delivery attempt number.X-Notification-Event-Type: event type (for exampleincident.opened).X-Notification-Tenant-Id: tenant context for receiver-side routing.X-Notification-Signature: optionalsha256=<hex>HMAC over raw JSON body when destination secret is configured.X-Notification-Idis regenerated on DLQ replay because replay creates a new job id.- Payloads are serialized deterministically and each attempt stores
payload_sha256for forensic verification. - Retry semantics: exponential backoff + deterministic jitter, capped by
NOTIFY_MAX_ATTEMPTS. - Max age policy: jobs older than
NOTIFY_MAX_AGE_SECONDSmove to DLQ with reasonexpired.
Receiver Quickstart:
- Contract spec:
docs/notification-receiver-contract.md(v1.0). - Operations runbook:
docs/runbooks/notification-receiver-contract.md. - Start receiver:
make receiver-up
curl -s http://localhost:9001/health- Configure a tenant destination pointing at local receiver:
curl -s -X POST -H "Authorization: Bearer $ADMIN_API_KEY" -H "Content-Type: application/json" \
http://localhost:8000/v1/admin/notifications/destinations \
-d '{
"tenant_id":"t1",
"url":"http://notify_receiver:9001/webhook",
"headers_json":{"X-Receiver-Profile":"strict_signed"},
"secret":"receiver-shared-secret"
}'- Trigger a notification through the normal incident path:
curl -s -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/v1/admin/alerts/evaluate?window=5m"- Inspect receiver receipts and stats:
curl -s "http://localhost:9001/received?limit=50"
make receiver-stats
curl -s http://localhost:9001/ops- Run deterministic sender↔receiver validation:
docker compose exec api make notify-e2eReceiver env knobs:
RECEIVER_SHARED_SECRETRECEIVER_REQUIRE_SIGNATURERECEIVER_REQUIRE_TIMESTAMPRECEIVER_MAX_TIMESTAMP_SKEW_SECONDSRECEIVER_FAIL_MODE=never|always|first_nRECEIVER_FAIL_NRECEIVER_STORE_PATH
Compatibility notes:
- Other language receivers must verify
sha256=<hex>HMAC over raw bytes. - Use constant-time compare and dedupe by
X-Notification-Id. - Fixture matrix for expected modes lives in
nexusrag/tests/fixtures/notification_receiver_compatibility.json.
curl -s -X POST -H "Authorization: Bearer $ADMIN_API_KEY" -H "Content-Type: application/json" \
http://localhost:8000/v1/admin/notifications/routes \
-d '{
"tenant_id":"t1",
"name":"critical-incidents",
"enabled":true,
"priority":10,
"match_json":{"event_type":["incident.opened"],"severity":["high","critical"]},
"destinations_json":[{"destination_id":"DESTINATION_ID_1"},{"destination_id":"DESTINATION_ID_2"}]
}'
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/v1/admin/notifications/routes?tenant_id=t1"DLQ and replay:
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/v1/admin/notifications/jobs?tenant_id=t1&status=dlq"
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/v1/admin/notifications/dlq?tenant_id=t1"
curl -s -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
http://localhost:8000/v1/admin/notifications/dlq/$DLQ_ID/replayOperability summary:
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
http://localhost:8000/v1/ops/operability
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
http://localhost:8000/v1/ops/notificationsOperator actions (Idempotency-Key required):
curl -s -X POST -H "Authorization: Bearer $ADMIN_API_KEY" -H "Idempotency-Key: op-1" \
http://localhost:8000/v1/admin/ops/actions/disable-ttsGit publish diagnostics:
./scripts/git_network_diag.shSee docs/runbooks/github-push-troubleshooting.md for HTTPS/SSH fallback workflows.
Run deploy preflight checks:
make preflightGenerate GA readiness artifacts:
make ga-checklistArtifacts are written under var/ops/ by default:
preflight.jsonga-checklist-<timestamp>.jsonga-checklist-<timestamp>.md
Tenant self-serve endpoints let admins manage API keys, view usage, and request plan upgrades without platform intervention.
Self-serve API key lifecycle:
# Create key (plaintext returned once)
curl -s -X POST -H "Content-Type: application/json" -H "Authorization: Bearer $ADMIN_API_KEY" \
http://localhost:8000/v1/self-serve/api-keys \
-d '{"name":"ci-bot","role":"editor"}'
# List keys (no secrets)
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
http://localhost:8000/v1/self-serve/api-keys
# Revoke key (idempotent)
curl -s -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
http://localhost:8000/v1/self-serve/api-keys/$KEY_ID/revoke
Usage dashboard:
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/v1/self-serve/usage/summary?window_days=30"
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
"http://localhost:8000/v1/self-serve/usage/timeseries?metric=requests&granularity=day&days=30"
Plan visibility and upgrades:
curl -s -H "Authorization: Bearer $ADMIN_API_KEY" \
http://localhost:8000/v1/self-serve/plan
curl -s -X POST -H "Content-Type: application/json" -H "Authorization: Bearer $ADMIN_API_KEY" \
http://localhost:8000/v1/self-serve/plan/upgrade-request \
-d '{"target_plan":"pro","reason":"Need TTS and Bedrock"}'
Billing webhook test (feature-gated):
curl -s -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
http://localhost:8000/v1/self-serve/billing/webhook-test
Note: plaintext API keys are returned once on creation and must be stored securely by the client.
Generated SDKs live under:
sdk/typescript/sdk/python/sdk/frontend/(frontend BFF + SSE helpers)
Regenerate from OpenAPI:
make sdk-generate
TypeScript (fetch) example:
import { createClient } from "./sdk/typescript/client";
const api = await createClient({
apiKey: process.env.NEXUSRAG_API_KEY!,
basePath: "http://localhost:8000",
});
const health = await api.healthHealthGet();
Python example:
import sys
from pathlib import Path
sys.path.append(str(Path("sdk/python/generated")))
sys.path.append(str(Path("sdk/python")))
from client import create_client
api = create_client(api_key="your_api_key", base_url="http://localhost:8000")
health = api.health_health_get()
Frontend SDK example:
import { UiClient, buildQuery, connectRunStream } from "./sdk/frontend/src";
const client = new UiClient({
baseUrl: "http://localhost:8000",
apiKey: process.env.NEXUSRAG_API_KEY!,
});
const docs = await client.listDocuments(buildQuery({ limit: 25, sort: "-created_at" }));
connectRunStream({
baseUrl: "http://localhost:8000",
apiKey: process.env.NEXUSRAG_API_KEY!,
body: { session_id: "s1", corpus_id: "c1", message: "Hello", top_k: 5 },
onEvent: ({ data }) => console.log(data),
});
Phase 35 adds a reproducible performance suite under tests/perf/ with deterministic and integration execution modes.
Deterministic perf mode:
PERF_MODE_ENABLED=truePERF_FAKE_PROVIDER_MODE=trueLLM_PROVIDER=fakeINGEST_EXECUTION_MODE=inline
Integration perf mode:
- uses real runtime components (API + Redis + Postgres + worker)
- keeps workload shapes deterministic while exercising full queue/worker paths
Quickstart:
# deterministic gates + scenario artifacts
docker compose exec api make perf-test
# markdown summary from latest JSON artifacts
docker compose exec api make perf-report
# short soak (15 minutes)
docker compose exec api python tests/perf/scenarios/run_mixed.py --duration 900 --deterministic
Perf artifacts:
- JSON/markdown reports:
tests/perf/reports/ - gate output:
tests/perf/reports/perf-gates-latest.json - rollup summary:
tests/perf/reports/perf-summary-latest.md
Interpreting reports:
route_class.p95_msandroute_class.p99_mscapture end-to-end latency.sse.first_token_p95_mstracks time-to-first-token for/v1/run.extra.ingest_queue_wait_p95_mstracks ingestion backlog pressure.extra.noisy_neighbor_success_ratiocaptures multi-tenant fairness.
Capacity model:
python3 scripts/capacity_estimate.py --headroom 0.30- Outputs:
docs/capacity-model.mdtests/perf/reports/capacity-model.json
/runemitsrequest.accepted,token.delta,message.final, optionalaudio.*, anddoneevents with a monotonicseq.- Heartbeat events are emitted for long streams (
event: heartbeat). - If Vertex credentials/config are missing,
/runemits an SSEerrorevent with a clear message. - Retrieval uses a deterministic fake embedding (no external embedding APIs).
- Set
DEBUG_EVENTS=trueto emitdebug.retrievalSSE events after retrieval for validation.
- Copy env file:
cp .env.example .env
- Start services:
docker compose up --build -d
- Run migrations:
docker compose exec api alembic upgrade head
- Create an API key:
docker compose exec api python scripts/create_api_key.py --tenant t1 --role admin --name local-admin
- Export the key:
export API_KEY=<api_key_from_script>
- Health check:
curl -s http://localhost:8000/v1/health
- Seed demo data:
docker compose exec api python scripts/seed_demo.py
- SSE run (expect
request.accepted,token.delta,message.final, optionalaudio.*, anddone):
curl -N -H "Content-Type: application/json" -H "Authorization: Bearer $API_KEY" \
-X POST http://localhost:8000/v1/run \
-d '{
"session_id":"s1",
"corpus_id":"c1",
"message":"What is the testing strategy of agent 2.0?",
"top_k":5,
"audio":false
}'
- Run tests in container:
docker compose exec api pytest -q
This query should match seeded content about testing strategy and release gates:
curl -N -H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-X POST http://localhost:8000/v1/run \
-d '{
"session_id":"s1",
"corpus_id":"c1",
"message":"What are the release gates for Agent 2.0?",
"top_k":5,
"audio":false
}'
Expected behavior:
- If Vertex is configured, you will see
token.deltaevents followed bymessage.final. - If Vertex is missing, you will see an
errorevent, but retrieval and persistence still run before the LLM call.
pytest
make up
make migrate
make seed
make test
make perf-test
make perf-report
make sdk-generate
make security-audit
make security-lint
make security-secrets-scan
make compliance-snapshot
make lint
make typecheck
make secrets-scan
make sca
Security/compliance checks and evidence automation are available locally and in CI:
make security-audit
make security-lint
make security-secrets-scan
make compliance-snapshot TENANT_ID=t1 ACTOR_ID=admin
make lint
make typecheck
make secrets-scan
make sca
Compliance snapshot/bundle APIs:
POST /v1/admin/compliance/snapshotPOST /v1/admin/compliance/snapshotsGET /v1/admin/compliance/snapshots?limit=20GET /v1/admin/compliance/snapshots/{id}GET /v1/admin/compliance/bundle/{snapshot_id}.zipGET /v1/admin/compliance/snapshots/{snapshot_id}/download
API key lifecycle admin APIs:
GET /v1/admin/api-keys?tenant_id=t1PATCH /v1/admin/api-keys/{id}
Keyring admin APIs:
GET /v1/admin/keyringPOST /v1/admin/keyring/rotate?purpose=signing|encryption|backup_signing|backup_encryption|webhook_signingPOST /v1/admin/keyring/{key_id}/retire
Keyring enforcement behavior:
KEYRING_MASTER_KEY_REQUIRED=truefails closed for key rotation withKEYRING_NOT_CONFIGUREDwhenKEYRING_MASTER_KEYis missing.KEYRING_MASTER_KEY_REQUIRED=falsekeeps keyring optional for local/dev; rotation returnsKEYRING_DISABLEDif no key source is configured.- Evidence bundles use redacted config only and never decrypt key material.
API key inactivity enforcement:
AUTH_API_KEY_INACTIVE_ENFORCED=trueandAUTH_API_KEY_INACTIVE_DAYS=<N>enforce stale-key denial in auth (AUTH_INACTIVE_KEY).- Revoked keys return
AUTH_REVOKED_KEY; expired keys returnAUTH_EXPIRED_KEY. PATCH /v1/admin/api-keys/{id}with{"active": true}reactivates a key and resetslast_used_atas the inactivity anchor.
Retention proof APIs:
POST /v1/admin/governance/retention/run?task=prune_allGET /v1/admin/governance/retention/status
Bundle contents:
snapshot.jsoncontrols.jsonconfig_sanitized.json(secrets redacted)runbooks_index.jsonchangelog_excerpt.mdcapacity_model_excerpt.mdperf_gates_excerpt.jsonperf_report_summary.mdops_metrics_24h_summary.json
Compliance snapshot canonical fields:
captured_atis the canonical capture timestamp (legacycreated_atis still returned for compatibility).results_jsoncontains normalized evaluation output.artifact_paths_jsonpersists generated bundle metadata (bundle_path,bundle_download_path, timestamps).
CLI helpers:
python scripts/rotate_api_key.py <old_key_id> [--keep-old-active]python scripts/list_api_keys.py --tenant t1 --inactive-only
Before going live, ensure all of the following environment variables are explicitly set. The application will refuse to start if the marked items are missing when AUTH_DEV_BYPASS=false.
| Variable | Purpose |
|---|---|
UI_CURSOR_SECRET |
Signs pagination cursor tokens. Must not be the default value. |
BACKUP_ENCRYPTION_KEY |
Encrypts backup artifacts (required when BACKUP_ENCRYPTION_ENABLED=true). |
BACKUP_SIGNING_KEY |
HMAC key for backup manifest signatures (required when BACKUP_SIGNING_ENABLED=true). |
BILLING_WEBHOOK_SECRET |
HMAC for webhook payloads (required when BILLING_WEBHOOK_ENABLED=true). |
KEYRING_MASTER_KEY |
Master KEK for the platform keyring (required when KEYRING_MASTER_KEY_REQUIRED=true). |
| Variable | Example | Notes |
|---|---|---|
DATABASE_URL |
postgresql+asyncpg://user:pass@host/db |
Postgres 16+ with pgvector extension |
REDIS_URL |
redis://host:6379/0 |
Redis 7+ |
AUTH_ENABLED |
true |
Must be true in production |
AUTH_DEV_BYPASS |
false |
Must be false in production |
| Variable | Notes |
|---|---|
GOOGLE_CLOUD_PROJECT |
Required for Vertex AI LLM and GCP retrieval |
GOOGLE_CLOUD_LOCATION |
GCP region (e.g. us-central1) |
OPENAI_API_KEY |
Required when TTS_PROVIDER=openai |
python scripts/preflight.py --output-json var/ops/preflight.jsonThe Prometheus scrape endpoint is available at /v1/metrics with no authentication required.
Recommended scrape config:
scrape_configs:
- job_name: nexusrag
static_configs:
- targets: ["your-api-host:8000"]
metrics_path: /v1/metrics- Branch naming:
feat/<short-scope>orfix/<short-scope> - Bump version: update
pyproject.tomland add a new entry inCHANGELOG.md - Tag release:
git tag vX.Y.Z - Use the repo's default branch name (e.g.,
main); do not assume a specific remote.