Problem Statement
No metrics instrumentation exists. No Prometheus /metrics endpoint, no prometheus_client, no OpenTelemetry. No way to monitor request rates, latencies, error rates, or business metrics (positions created, vault deposits).
Evidence
- No metrics endpoint in
quantara/web_app/api/
- No
prometheus_client in pyproject.toml
- No middleware instrumenting request counts or latencies
Impact
Medium — monitoring blind spot. Capacity planning is guesswork without request rate data. Production incidents detected by users, not alerts. Cannot set SLOs (99.9% latency < 500ms) without latency histograms.
Proposed Solution
Add prometheus_client with FastAPI middleware instrumenting: http_requests_total (count), http_request_duration_seconds (histogram), http_requests_in_flight (gauge). Expose /metrics endpoint behind authentication or internal-only. Start with request metrics, add business metrics later.
Acceptance Criteria
File Map
quantara/web_app/api/metrics.py — New: Prometheus metrics endpoint and middleware
quantara/web_app/api/main.py — add metrics route and middleware
quantara/pyproject.toml — add prometheus_client dependency
Dependencies
- Related: REPO-040 (health check should be monitored via Prometheus)
Testing Strategy
- Unit: Test metrics middleware increments counters correctly
- Integration: Send requests, query
/metrics, verify counters and histograms populated
Security Considerations
Metrics endpoint exposes request patterns. Protect with authentication or restrict to internal network. Don't expose business-sensitive metrics publicly. Avoid high-cardinality labels (wallet_id as label would explode cardinality).
Definition of Done
Labels: observability
Priority: Medium
Difficulty: Intermediate
Estimated Effort: 3h
Problem Statement
No metrics instrumentation exists. No Prometheus
/metricsendpoint, noprometheus_client, no OpenTelemetry. No way to monitor request rates, latencies, error rates, or business metrics (positions created, vault deposits).Evidence
quantara/web_app/api/prometheus_clientinpyproject.tomlImpact
Medium — monitoring blind spot. Capacity planning is guesswork without request rate data. Production incidents detected by users, not alerts. Cannot set SLOs (99.9% latency < 500ms) without latency histograms.
Proposed Solution
Add
prometheus_clientwith FastAPI middleware instrumenting:http_requests_total(count),http_request_duration_seconds(histogram),http_requests_in_flight(gauge). Expose/metricsendpoint behind authentication or internal-only. Start with request metrics, add business metrics later.Acceptance Criteria
/metricsendpoint exposed (configurable auth)File Map
quantara/web_app/api/metrics.py— New: Prometheus metrics endpoint and middlewarequantara/web_app/api/main.py— add metrics route and middlewarequantara/pyproject.toml— addprometheus_clientdependencyDependencies
Testing Strategy
/metrics, verify counters and histograms populatedSecurity Considerations
Metrics endpoint exposes request patterns. Protect with authentication or restrict to internal network. Don't expose business-sensitive metrics publicly. Avoid high-cardinality labels (wallet_id as label would explode cardinality).
Definition of Done
Labels: observability
Priority: Medium
Difficulty: Intermediate
Estimated Effort: 3h