Skip to content

feat: add OTel + Prometheus instrumentation for Guard/RAG observability#903

Merged
SdSarthak merged 1 commit into
SdSarthak:mainfrom
Pcmhacker-piro:fix/otel-prometheus-observability
Jun 2, 2026
Merged

feat: add OTel + Prometheus instrumentation for Guard/RAG observability#903
SdSarthak merged 1 commit into
SdSarthak:mainfrom
Pcmhacker-piro:fix/otel-prometheus-observability

Conversation

@Pcmhacker-piro
Copy link
Copy Markdown
Contributor

Summary
Adds OpenTelemetry + Prometheus instrumentation across the backend for comprehensive observability of Guard inference, RAG retrieval, and database queries.
Closes #319
Type of Change

  • New feature
    Checklist
  • I have read CONTRIBUTING.md
  • My code follows the project style
  • I have added/updated tests where relevant
  • Tests/lint pass locally (if available)
  • I have not committed .env or any secrets
  • I have updated documentation if needed
    What was added
  • FastAPI middleware instrumentation -- request latency histograms, active request gauges, and request count via prometheus-fastapi-instrumentator
  • Guard inference pipeline -- aegis_guard_inference_duration_seconds histogram and aegis_guard_scans_total counter, labelled by decision (allow/block/sanitize)
  • RAG retrieval chain -- aegis_rag_retrieval_duration_seconds histogram and aegis_rag_queries_total counter
  • SQLAlchemy query instrumentation -- aegis_db_query_duration_seconds histogram via cursor event listeners, labelled by SQL operation type
  • /metrics -- Prometheus scrape endpoint (auto-exposed by instrumentator)
  • /ready -- Readiness probe endpoint
  • OTel configuration settings -- OTEL_SERVICE_NAME, OTEL_METRICS_EXPORTER, OTEL_TRACES_EXPORTER
  • Dependencies -- added opentelemetry-api, opentelemetry-sdk, opentelemetry-exporter-prometheus, prometheus-fastapi-instrumentator
    CHANGED FILES
  • backend/app/core/telemetry.py (new) — Core OTel + Prometheus module with metric definitions, decorators, and instrumentator setup
  • backend/app/core/config.py — Added OTEL_* settings
  • backend/app/core/database.py — SQLAlchemy before/after cursor execute event listeners for query latency
  • backend/app/main.py — Wired up setup_telemetry(), added /ready endpoint
  • backend/app/modules/guard/llm_guard.py — Added @instrument_guard decorator to LLMGuard.guard()
  • backend/app/modules/rag/retrieval_chain.py — Added @instrument_rag decorator to GroundedRetrievalQA.call()
  • backend/requirements.txt — Added OTel + prometheus-fastapi-instrumentator deps; removed duplicate entries
    COMMITS
    72e9547 - feat: add OTel + Prometheus instrumentation for Guard/RAG observability
    TESTING PERFORMED
    N/A (dependencies not installed locally; CI will run tests with updated requirements.txt)
    FINAL STATUS
  • Branch Name: fix/otel-prometheus-observability
  • Commit Hash: 72e9547
  • PR Created: Compare URL ready — see link above (token lacked createPullRequest scope)
  • Ready for Review: Yes

Implements OpenTelemetry and Prometheus instrumentation across the
backend:
- FastAPI middleware (via prometheus-fastapi-instrumentator) for
  request latency histograms and active request gauges
- Guard inference pipeline timing and decision counters
- RAG retrieval latency and query counters
- SQLAlchemy query duration histograms via cursor event listeners
- /metrics (Prometheus scrape), /health, and /ready endpoints
- OTel config settings with Prometheus exporter

Closes SdSarthak#319
@Pcmhacker-piro
Copy link
Copy Markdown
Contributor Author

@SdSarthak

Hi, the checks have passed. Could you please review and approve the pending workflows when you have a chance? Thank you!

@SdSarthak SdSarthak merged commit 0945a99 into SdSarthak:main Jun 2, 2026
@SdSarthak SdSarthak added gssoc:approved GSSoC approved contribution — required for points to count level:advanced Advanced difficulty task type:feature New feature type:devops DevOps, CI/CD, infrastructure quality:exceptional Exceptional quality contribution labels Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gssoc:approved GSSoC approved contribution — required for points to count level:advanced Advanced difficulty task quality:exceptional Exceptional quality contribution type:devops DevOps, CI/CD, infrastructure type:feature New feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OTel + Prometheus Instrumentation for Guard/RAG Observability

2 participants