-
Notifications
You must be signed in to change notification settings - Fork 579
[FEAT] Dashboard metrics system #1724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary by CodeRabbit
WalkthroughAdds a Dashboard Metrics feature: new Django app with models, migrations, Celery tasks and worker, Redis-backed bucket cache, a core metrics API and queue backend, services and management commands, extensive tests, frontend dashboard UI/components/hooks/styles, and runtime tooling (docker, poe, dependencies). Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Frontend Client
participant API as DashboardMetricsViewSet
participant Cache as Redis Cache
participant Service as MetricsQueryService
participant DB as Metrics DB
Client->>API: GET /metrics/summary?org=X&start=Y&end=Z
API->>API: _select_source(params)
alt hourly source & bucket-cache enabled
API->>Cache: mget_metrics_buckets(org, range)
Cache-->>API: cached buckets + missing list
API->>Service: query missing buckets
Service->>DB: fetch missing rows
DB-->>Service: rows
Service-->>API: rows
API->>Cache: mset_metrics_buckets(new buckets)
else daily/monthly or full cache miss
API->>Service: query aggregated data
Service->>DB: fetch aggregates
DB-->>Service: aggregates
Service-->>API: aggregates
end
API->>Client: HTTP 200 MetricsResponse
sequenceDiagram
participant Producer as Integration Point
participant Capture as MetricsCapture
participant Core as core.metrics.record
participant Backend as QueueBackend
participant Broker as RabbitMQ
participant Worker as worker-metrics
participant Task as process_dashboard_metric_events
participant DB as Metrics DB
Producer->>Capture: record_*(...)
Capture->>Core: record(org_id, metric_name, value,...)
Core->>Backend: backend.record(MetricEvent)
Backend->>Broker: publish Celery-style message to queue
Broker->>Worker: deliver to dashboard_metric_events
Worker->>Task: process batch of events
Task->>DB: bulk upsert EventMetricsHourly/Daily/Monthly
DB-->>Task: upsert results
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
Fix all issues with AI Agents 🤖
In @backend/dashboard_metrics/capture.py:
- Around line 51-338: MetricsCapture methods call record() with the metric enum
as the first positional arg and keyword names like org_id= and value=, but core
record signature expects org_id first, metric_name second and metric_value
third, so change every call in record_api_request, record_etl_execution,
record_document_processed (both DOCUMENTS_PROCESSED and PAGES_PROCESSED),
record_llm_call (LLM_CALLS and LLM_USAGE), record_challenge,
record_summarization, and record_prompt_execution to match the core signature:
call record(org_id, MetricName.<X>, <metric_value>, labels=metric_labels,
project=project, tag=None) (use 1 for single-count events, pages for pages, cost
for LLM_USAGE as float), and remove the invalid value= keyword and duplicate
org_id keyword usage.
In @backend/dashboard_metrics/urls.py:
- Around line 9-15: The URL configuration is missing a mapping for the
DashboardMetricsViewSet.health action; add a new view mapping similar to the
others (e.g., create metrics_health = DashboardMetricsViewSet.as_view({"get":
"health"})) and include it in the URL patterns so GET /api/v2/metrics/health/ is
exposed; update any import/urlpatterns list references where other metrics_*
views are registered to include metrics_health.
In @backend/dashboard_metrics/views.py:
- Around line 267-273: Both live_summary and live_series call
UserContext.get_organization() and access .id directly; add the same guard used
in get_queryset(): if organization is None (or falsy) log an explanatory message
and raise PermissionDenied before accessing organization.id, then pass org_id to
MetricsQueryService.get_all_metrics_summary / get_all_metrics_series; apply the
same check for the other occurrence around lines referenced (the block using
UserContext.get_organization() / organization.id at ~310-312).
In @unstract/core/src/unstract/core/metrics/types.py:
- Around line 35-48: Add a unit test that ensures every value of the MetricName
enum has a corresponding key in METRIC_TYPE_MAP to prevent KeyError in
get_metric_type(); specifically, iterate over MetricName (or
MetricName.__members__.values()/MetricName) and assert that each .value is
present in METRIC_TYPE_MAP, referencing METRIC_TYPE_MAP and MetricName; this
complements validate_metric() and prevents missing entries that would cause
get_metric_type() to fail.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/pyproject.toml (1)
15-21: celery-batches version too old—use >=0.9.0 for Celery 5.3.4 and Python 3.12 supportThe constraint
celery-batches>=0.8.0pins to an outdated release (Jun 2023). For Celery 5.3.4 and Python 3.12, usecelery-batches>=0.9.0, which is documented to support Celery ~5.0 and Python 3.9–3.13. Version 0.8.0 predates Python 3.12 and lacks testing against current Celery 5.x releases.
🧹 Nitpick comments (16)
backend/dashboard_metrics/tasks.py (2)
37-75: Time bucketing helpers look correct; naive datetimes are assumed UTCThe truncation helpers correctly normalize numeric timestamps to UTC and bucket to hour/day/month. Note that naive datetimes are implicitly treated as UTC; ensure all producers either send UTC or explicitly aware datetimes so you don’t silently mis-bucket mixed timezones.
77-206: Batched aggregation logic is solid; consider tightening event validationThe batching and in-memory aggregation into hourly/daily/monthly buckets looks good and aligns with celery-batches’ request shape. Two minor points to consider:
event = request.args[0] if request.args else request.kwargs.get("event", {})assumes the first arg is always a dict; if other producers appear later, a simpleisinstance(..., dict)guard could avoid noisy warnings.timestampis expected as float/datetime; if there’s any chance of ISO strings coming through, you may want to normalize them explicitly rather than relying on a genericexcept Exceptionto skip them.Nothing blocking here; more of a future-hardening note.
backend/backend/urls_v2.py (1)
63-67: Metrics URL include is consistent with existing routingMounting
dashboard_metrics.urlsat"metrics/"follows the same pattern as other feature areas (e.g.,"usage/","notifications/"). Just ensure any external docs or clients expecting/api/v2/metrics/...are updated to match this base path if needed.docker/docker-compose.yaml (1)
120-138: worker-metrics service wiring looks consistent with existing workersThe new
worker-metricsservice mirrors the existing Celery worker patterns (image, entrypoint, queue, autoscale, env) and targets thedashboard_metric_eventsqueue as intended. If you expect metrics tasks to need shared workflow data or tool configs, consider whether it should also mount the same volumes as other workers; otherwise this setup looks good.backend/pyproject.toml (1)
145-148: worker-metrics Poe task aligns with docker-compose configurationThe
worker-metricsPoe task mirrors the docker-compose command (same app, log level, queue, autoscale), which is helpful for running the metrics worker locally outside Docker. Keep this in sync with any future queue or autoscale changes to avoid drift between CLI and compose.backend/dashboard_metrics/tests/test_tasks.py (1)
149-164: Consider renaming unused loop variable.The loop variable
iis not used within the loop body. Following Python convention, consider renaming it to_to indicate it's intentionally unused.🔎 Proposed refactor
- for i in range(5): + for _ in range(5): mock_request = MagicMock()unstract/core/src/unstract/core/metrics/__init__.py (1)
113-115: Uselogging.exceptionto preserve the traceback.When catching exceptions during event creation, using
logger.exception()instead oflogger.error()will include the full traceback, which aids debugging.🔎 Suggested fix
except Exception as e: - logger.error(f"Failed to create metric event: {e}") + logger.exception(f"Failed to create metric event: {e}") return Falseunstract/core/src/unstract/core/metrics/registry.py (1)
42-48: Non-deterministic ordering inget_all_metrics().The function returns
list(_REGISTERED_METRICS)where_REGISTERED_METRICSis a set. While this works, the order of returned metrics may vary between calls. If consistent ordering is needed for API responses or testing, consider sorting the result.🔎 Optional fix for deterministic ordering
def get_all_metrics() -> list[str]: """Get all registered metric names. Returns: List of all registered metric names """ - return list(_REGISTERED_METRICS) + return sorted(_REGISTERED_METRICS)backend/dashboard_metrics/migrations/0004_update_cleanup_periodic_tasks.py (1)
38-57: Clarify naming: task cleans "daily" aggregates on a weekly schedule.The comment says "weekly cleanup task for daily metrics" but the task is named
dashboard_metrics_cleanup_daily. This is technically correct (it cleans data from the daily aggregation table on a weekly schedule), but the naming could cause confusion. Consider renaming todashboard_metrics_cleanup_daily_weeklyor updating the description to clarify it runs weekly to clean daily-aggregated data.unstract/core/src/unstract/core/metrics/types.py (1)
85-96: Consider defensive error handling infrom_dict.If required keys (
org_id,metric_name,metric_value,metric_type) are missing fromdata, aKeyErrorwill be raised. While the caller should ensure valid data, adding a brief check or using.get()with explicit validation could provide clearer error messages during deserialization failures.unstract/core/src/unstract/core/metrics/backends/queue.py (1)
151-156: Uselogger.exceptionfor error logging with traceback.Replace
logger.errorwithlogger.exceptionin the exception handlers to include stack traces, which aids debugging production issues.🔎 Suggested fix
except KombuError as e: - logger.error(f"Failed to queue metric {event.metric_name}: {e}") + logger.exception(f"Failed to queue metric {event.metric_name}: {e}") return False except Exception as e: - logger.error(f"Unexpected error queuing metric: {e}") + logger.exception(f"Unexpected error queuing metric: {e}") return Falsebackend/dashboard_metrics/views.py (1)
335-369: Improve error handling and observability inlive_seriesmetric loop.Catching a blanket
Exceptionfor each metric is reasonable to keep the endpoint partially available, but the current logging (logger.error) loses the traceback and makes debugging failures harder.Switching to
logger.exceptionhere will preserve the stack trace while keeping behavior the same.Proposed logging tweak
- except Exception as e: - logger.error(f"Failed to fetch metric {metric_name}: {e}") + except Exception: + logger.exception("Failed to fetch metric %s", metric_name) errors.append(metric_name)backend/dashboard_metrics/cache.py (1)
38-69: Time‑aware TTL helper is well‑designed but currently unused.
get_time_aware_cache_ttl()correctly distinguishes queries that include the current hour from purely historical ones, which would be useful to avoid over‑caching near‑real‑time data. Right now it isn’t wired intocache_metrics_response, so all endpoints rely on static TTLs.Consider integrating
get_time_aware_cache_ttl()into the decorator (e.g., deriving TTL from a parsedend_dateparam) for summary/series endpoints that query recent data.backend/dashboard_metrics/models.py (3)
14-19: Avoid duplicatingMetricTypethat already exists in core metricsThere’s already a
MetricTypeenum inunstract/core/src/unstract/core/metrics/types.pywith the same values ("counter","histogram"). Having two differentMetricTypedefinitions (oneEnum, oneTextChoices) increases the chance of divergence or confusion.Consider either:
- Reusing the core
MetricTypeas the single source of truth and mapping its values into aTextChoices, or- Renaming this one (e.g.,
DashboardMetricType) to make the separation explicit.
21-37: Manager inheritance can be simplified
DefaultOrganizationManagerMixinalready subclassesmodels.Manager(perbackend/utils/models/organization_mixin.py), so inheriting from both is redundant:class EventMetricsHourlyManager(DefaultOrganizationManagerMixin): ...Same applies to
EventMetricsDailyManagerandEventMetricsMonthlyManager.
113-130: Address Ruff RUF012 on mutable Meta attributes (indexes,constraints)Ruff is flagging
indexesandconstraintsas mutable class attributes (RUF012). In DjangoMetaclasses this is common, but if you want to satisfy the linter and improve type hints you can annotate them asClassVar:from typing import ClassVar from django.db import models class Meta: indexes: ClassVar[list[models.Index]] = [ ... ] constraints: ClassVar[list[models.Constraint]] = [ ... ]You can apply the same pattern to the
Metaclasses of all three models, or alternatively disableRUF012for DjangoMetaif you prefer the standard Django style.Also applies to: 219-236, 325-342
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to Reviews > Disable Cache setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
⛔ Files ignored due to path filters (1)
backend/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (30)
backend/backend/celery_config.pybackend/backend/settings/base.pybackend/backend/urls_v2.pybackend/dashboard_metrics/__init__.pybackend/dashboard_metrics/apps.pybackend/dashboard_metrics/cache.pybackend/dashboard_metrics/capture.pybackend/dashboard_metrics/migrations/0001_initial.pybackend/dashboard_metrics/migrations/0002_setup_cleanup_periodic_task.pybackend/dashboard_metrics/migrations/0003_add_daily_monthly_tables.pybackend/dashboard_metrics/migrations/0004_update_cleanup_periodic_tasks.pybackend/dashboard_metrics/migrations/__init__.pybackend/dashboard_metrics/models.pybackend/dashboard_metrics/serializers.pybackend/dashboard_metrics/services.pybackend/dashboard_metrics/tasks.pybackend/dashboard_metrics/tests/__init__.pybackend/dashboard_metrics/tests/test_tasks.pybackend/dashboard_metrics/urls.pybackend/dashboard_metrics/views.pybackend/pyproject.tomldocker/docker-compose.yamlunstract/core/src/unstract/core/metrics/__init__.pyunstract/core/src/unstract/core/metrics/backends/__init__.pyunstract/core/src/unstract/core/metrics/backends/base.pyunstract/core/src/unstract/core/metrics/backends/noop.pyunstract/core/src/unstract/core/metrics/backends/queue.pyunstract/core/src/unstract/core/metrics/config.pyunstract/core/src/unstract/core/metrics/registry.pyunstract/core/src/unstract/core/metrics/types.py
🧰 Additional context used
🧬 Code graph analysis (16)
unstract/core/src/unstract/core/metrics/backends/noop.py (3)
unstract/core/src/unstract/core/metrics/types.py (1)
MetricEvent(52-96)unstract/core/src/unstract/core/metrics/backends/base.py (3)
AbstractMetricBackend(8-40)record(15-24)flush(27-32)unstract/core/src/unstract/core/metrics/__init__.py (1)
record(58-119)
unstract/core/src/unstract/core/metrics/backends/base.py (4)
unstract/core/src/unstract/core/metrics/types.py (1)
MetricEvent(52-96)unstract/core/src/unstract/core/metrics/__init__.py (1)
record(58-119)unstract/core/src/unstract/core/metrics/backends/noop.py (2)
record(25-40)flush(42-44)unstract/core/src/unstract/core/metrics/backends/queue.py (3)
record(117-156)flush(200-205)close(207-215)
unstract/core/src/unstract/core/metrics/registry.py (2)
unstract/core/src/unstract/core/metrics/types.py (2)
MetricName(15-32)MetricType(8-12)backend/dashboard_metrics/models.py (1)
MetricType(14-18)
backend/dashboard_metrics/tasks.py (1)
backend/dashboard_metrics/models.py (4)
EventMetricsDaily(145-248)EventMetricsHourly(39-142)EventMetricsMonthly(251-354)MetricType(14-18)
unstract/core/src/unstract/core/metrics/types.py (1)
backend/dashboard_metrics/models.py (1)
MetricType(14-18)
backend/dashboard_metrics/urls.py (1)
backend/dashboard_metrics/views.py (1)
DashboardMetricsViewSet(40-451)
backend/dashboard_metrics/views.py (4)
backend/permissions/permission.py (1)
IsOrganizationMember(17-22)backend/utils/user_context.py (1)
UserContext(8-32)backend/dashboard_metrics/cache.py (1)
cache_metrics_response(88-145)backend/dashboard_metrics/services.py (2)
MetricsQueryService(24-447)get_documents_processed(54-86)
backend/dashboard_metrics/models.py (3)
backend/utils/models/organization_mixin.py (2)
DefaultOrganizationManagerMixin(26-29)DefaultOrganizationMixin(7-23)unstract/core/src/unstract/core/metrics/types.py (1)
MetricType(8-12)backend/dashboard_metrics/serializers.py (1)
Meta(110-124)
backend/dashboard_metrics/capture.py (5)
unstract/core/src/unstract/core/metrics/types.py (2)
MetricName(15-32)MetricType(8-12)unstract/core/src/unstract/core/metrics/__init__.py (1)
record(58-119)unstract/core/src/unstract/core/metrics/backends/base.py (1)
record(15-24)unstract/core/src/unstract/core/metrics/backends/noop.py (1)
record(25-40)unstract/core/src/unstract/core/metrics/backends/queue.py (1)
record(117-156)
unstract/core/src/unstract/core/metrics/backends/queue.py (3)
unstract/core/src/unstract/core/metrics/types.py (1)
MetricEvent(52-96)unstract/core/src/unstract/core/metrics/backends/base.py (4)
AbstractMetricBackend(8-40)close(34-40)record(15-24)flush(27-32)unstract/core/src/unstract/core/metrics/__init__.py (1)
record(58-119)
unstract/core/src/unstract/core/metrics/config.py (3)
unstract/core/src/unstract/core/metrics/backends/base.py (1)
AbstractMetricBackend(8-40)unstract/core/src/unstract/core/metrics/backends/noop.py (1)
NoopBackend(14-44)unstract/core/src/unstract/core/metrics/backends/queue.py (1)
QueueBackend(28-215)
backend/dashboard_metrics/serializers.py (2)
backend/dashboard_metrics/models.py (4)
EventMetricsHourly(39-142)Meta(109-142)Meta(215-248)Meta(321-354)backend/dashboard_metrics/views.py (2)
series(113-189)summary(70-109)
backend/dashboard_metrics/migrations/0001_initial.py (1)
backend/dashboard_metrics/migrations/0003_add_daily_monthly_tables.py (1)
Migration(10-261)
unstract/core/src/unstract/core/metrics/backends/__init__.py (3)
unstract/core/src/unstract/core/metrics/backends/base.py (1)
AbstractMetricBackend(8-40)unstract/core/src/unstract/core/metrics/backends/noop.py (1)
NoopBackend(14-44)unstract/core/src/unstract/core/metrics/backends/queue.py (1)
QueueBackend(28-215)
backend/dashboard_metrics/migrations/0004_update_cleanup_periodic_tasks.py (2)
backend/dashboard_metrics/migrations/0002_setup_cleanup_periodic_task.py (1)
Migration(40-51)backend/dashboard_metrics/migrations/0003_add_daily_monthly_tables.py (1)
Migration(10-261)
backend/dashboard_metrics/cache.py (3)
backend/utils/user_context.py (1)
UserContext(8-32)backend/utils/cache_service.py (1)
clear_cache(41-43)backend/dashboard_metrics/views.py (1)
overview(193-245)
🪛 Ruff (0.14.10)
unstract/core/src/unstract/core/metrics/backends/noop.py
25-25: Unused method argument: event
(ARG002)
unstract/core/src/unstract/core/metrics/__init__.py
39-53: __all__ is not sorted
Apply an isort-style sorting to __all__
(RUF022)
113-113: Do not catch blind exception: Exception
(BLE001)
114-114: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
backend/backend/celery_config.py
42-49: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
56-66: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
69-73: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
unstract/core/src/unstract/core/metrics/registry.py
38-38: Avoid specifying long messages outside the exception class
(TRY003)
backend/dashboard_metrics/migrations/0002_setup_cleanup_periodic_task.py
6-6: Unused function argument: schema_editor
(ARG001)
33-33: Unused function argument: schema_editor
(ARG001)
41-44: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
46-51: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
backend/dashboard_metrics/tasks.py
184-184: Do not catch blind exception: Exception
(BLE001)
257-257: Do not catch blind exception: Exception
(BLE001)
258-258: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
311-311: Do not catch blind exception: Exception
(BLE001)
312-312: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
365-365: Do not catch blind exception: Exception
(BLE001)
366-366: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
backend/dashboard_metrics/tests/test_tasks.py
149-149: Loop control variable i not used within loop body
Rename unused i to _i
(B007)
backend/dashboard_metrics/views.py
47-47: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
48-48: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
50-50: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
51-51: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
52-52: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
60-60: Abstract raise to an inner function
(TRY301)
60-60: Avoid specifying long messages outside the exception class
(TRY003)
65-65: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
66-66: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
66-66: Avoid specifying long messages outside the exception class
(TRY003)
193-193: Unused method argument: request
(ARG002)
359-359: Do not catch blind exception: Exception
(BLE001)
360-360: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
403-403: Unused method argument: request
(ARG002)
436-436: Consider moving this statement to an else block
(TRY300)
437-437: Do not catch blind exception: Exception
(BLE001)
438-438: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
448-448: Consider moving this statement to an else block
(TRY300)
449-449: Do not catch blind exception: Exception
(BLE001)
450-450: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
backend/dashboard_metrics/models.py
113-130: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
131-142: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
219-236: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
237-248: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
325-342: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
343-354: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
backend/dashboard_metrics/capture.py
45-45: Consider moving this statement to an else block
(TRY300)
unstract/core/src/unstract/core/metrics/backends/queue.py
95-96: try-except-pass detected, consider logging the exception
(S110)
95-95: Do not catch blind exception: Exception
(BLE001)
101-101: Consider moving this statement to an else block
(TRY300)
103-103: Do not catch blind exception: Exception
(BLE001)
113-113: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
115-115: Avoid specifying long messages outside the exception class
(TRY003)
149-149: Consider moving this statement to an else block
(TRY300)
152-152: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
154-154: Do not catch blind exception: Exception
(BLE001)
155-155: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
212-212: Do not catch blind exception: Exception
(BLE001)
backend/dashboard_metrics/migrations/0003_add_daily_monthly_tables.py
11-14: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
16-261: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
backend/dashboard_metrics/serializers.py
112-123: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
backend/dashboard_metrics/migrations/0001_initial.py
13-15: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
17-136: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
backend/dashboard_metrics/migrations/0004_update_cleanup_periodic_tasks.py
9-9: Unused function argument: schema_editor
(ARG001)
60-60: Unused function argument: schema_editor
(ARG001)
69-72: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
74-79: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
backend/dashboard_metrics/cache.py
84-84: Probable use of insecure hash functions in hashlib: md5
(S324)
163-163: Consider moving this statement to an else block
(TRY300)
164-164: Do not catch blind exception: Exception
(BLE001)
165-165: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
252-252: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (28)
backend/dashboard_metrics/__init__.py (1)
1-4: LGTM!Clean package initializer with clear documentation describing the Dashboard Metrics app's purpose.
backend/dashboard_metrics/migrations/__init__.py (1)
1-1: LGTM!Standard migrations package initializer following Django conventions.
backend/backend/settings/base.py (1)
345-345: LGTM!The dashboard_metrics app is correctly registered in SHARED_APPS, making it available across all organizations in the multi-tenant setup. This is appropriate for organization-scoped metrics collection and querying.
unstract/core/src/unstract/core/metrics/backends/__init__.py (1)
1-11: LGTM!Clean public API definition for the metrics backends package. The
__all__declaration correctly matches the imported backends, providing clear control over the public interface.backend/dashboard_metrics/apps.py (1)
6-11: LGTM!Standard Django app configuration following Django 4.2+ best practices. The BigAutoField default is appropriate, and the app naming is correct.
backend/dashboard_metrics/tests/__init__.py (1)
1-1: Tests package init is fineDocstring-only
__init__is harmless and documents the test package clearly.unstract/core/src/unstract/core/metrics/backends/base.py (1)
1-40: LGTM! Clean abstract interface design.The abstract base class follows SOLID principles with a minimal, focused interface. The concrete
close()method with a default implementation callingflush()is a good design choice that simplifies backend implementations while allowing customization when needed.unstract/core/src/unstract/core/metrics/backends/noop.py (1)
21-44: LGTM! Appropriate no-op implementation.The NoopBackend correctly implements the abstract interface and provides helpful one-time logging to inform users that metrics are disabled. The
eventparameter inrecord()is required by theAbstractMetricBackendinterface, so the static analysis hint (ARG002) is a false positive.backend/dashboard_metrics/tests/test_tasks.py (1)
1-553: Excellent test coverage!The test suite is comprehensive and well-organized, covering time truncation helpers, batch processing, bulk upsert operations, cleanup tasks, and full integration flows. Good use of fixtures, edge cases, and assertions to verify data integrity across all metrics tables.
backend/dashboard_metrics/migrations/0002_setup_cleanup_periodic_task.py (1)
1-51: LGTM! Migration follows Django conventions.The migration correctly sets up a periodic cleanup task with proper forward and rollback operations. The static analysis hints (ARG001, RUF012) are false positives—the
schema_editorparameter and migration class attributes follow Django's required conventions for data migrations.unstract/core/src/unstract/core/metrics/config.py (1)
1-68: LGTM! Well-designed configuration module.The environment-driven backend selection with singleton caching is clean and robust. The fallback to
NoopBackendwhen metrics are disabled or misconfigured ensures safe defaults, and the logging at each decision point aids troubleshooting. Thereset_backend()function is a good addition for testing scenarios.unstract/core/src/unstract/core/metrics/__init__.py (1)
58-119: Well-structured public API.The
record()function provides a clean entry point with proper validation, enum-to-string conversion, and graceful error handling that returnsFalseon failures rather than propagating exceptions. The design appropriately decouples the public API from backend implementation details.unstract/core/src/unstract/core/metrics/registry.py (1)
1-60: Clean registry implementation.The registry provides efficient O(1) validation via set lookup, clear separation between validation and type retrieval, and appropriate error handling with
ValueErrorfor unknown metrics.backend/backend/celery_config.py (2)
36-53: Good production-ready resilience settings.The broker connection retry configuration with exponential backoff (
interval_step,interval_max), socket keepalive, and worker limits (max_tasks_per_child) will help maintain stability under broker disruptions and prevent memory leaks from long-running workers.
68-73: Task names intask_routesare correctly defined and match registered Celery tasks.The task names (
dashboard_metrics.process_events,dashboard_metrics.cleanup_hourly_data,dashboard_metrics.cleanup_daily_data) match their corresponding@shared_taskdefinitions inbackend/dashboard_metrics/tasks.py. Routing is correctly configured.unstract/core/src/unstract/core/metrics/types.py (1)
8-32: Well-structured metric type definitions.The dual inheritance from
strandEnumenables both type safety and string comparisons. The separation of counters (summed values) and histograms (distributions) is clearly documented and follows established metrics patterns.backend/dashboard_metrics/urls.py (1)
17-33: Clean URL configuration with proper ordering.The URL patterns are well-organized with specific paths before the UUID capture pattern, preventing routing conflicts. Using
format_suffix_patternsenables content negotiation for API responses.backend/dashboard_metrics/migrations/0004_update_cleanup_periodic_tasks.py (1)
60-65: Reverse migration is correct as written.Migration 0002 creates the
dashboard_metrics_cleanup_hourlytask. Migration 0004's forward operation usesupdate_or_createon that existing task (not creating a new one), adds the daily task, and deletes a legacy task. The reverse should return to the state after 0002, which means retaining the hourly task (still managed by 0002's reverse function) and removing only what 0004 added—the daily task. The current implementation is correct.Likely an incorrect or invalid review comment.
unstract/core/src/unstract/core/metrics/backends/queue.py (1)
178-198: Celery protocol v2 format is compatible with the project's Celery version (5.3.4+).The message construction follows the standard kombu protocol v2 format
[args, kwargs, embed], which remains compatible with Celery 5.3.4+ (and 5.5.3 in the workers package). The headers correctly specify the task name and ID for Celery task routing. No compatibility concerns.backend/dashboard_metrics/views.py (1)
402-450: Health checks look sound and side‑effect free.
health,_check_database, and_check_cachedo minimal, safe operations (anexists()call and a short‑lived cache set/get) and return structured status with appropriate HTTP 200/503 responses; this is suitable for liveness/readiness probes.backend/dashboard_metrics/migrations/0003_add_daily_monthly_tables.py (1)
16-260: Schema and indexing for daily/monthly tables look consistent and appropriate.The daily/monthly models mirror the hourly schema (including
labelsdefault, project/tag, organization FK, unique constraints, and GIN/index coverage), which should give predictable semantics and good query performance across all granularities.backend/dashboard_metrics/services.py (2)
383-447:get_all_metrics_summaryaggregates are consistent with individual query methods.The summary method correctly reuses the per‑metric query functions and normalizes potential
NULLsums (e.g., pages and LLM usage) withor 0, producing a clean dict of totals for the live summary endpoint.
230-269: The join logic inget_deployed_api_requestsis correct.
WorkflowExecution.pipeline_idis a UUIDField designed to hold IDs of eitherPipelineorAPIDeploymententities (as evidenced by the conditional logic inbackend/workflow_manager/internal_serializers.py). The subquery filteringpipeline_id__in=Subquery(APIDeployment.objects.filter(...).values("id"))correctly extracts deployed API request counts by matching executions where the pipeline_id refers to an API deployment.backend/dashboard_metrics/migrations/0001_initial.py (1)
17-135: Hourly metrics migration aligns with model definition and indexing strategy.Field definitions, indexes, and the
unique_hourly_metricconstraint all match theEventMetricsHourlymodel description, so the initial schema for hourly metrics looks correct and production‑ready.backend/dashboard_metrics/serializers.py (1)
11-124: Query and response serializers are consistent with view usage and enforce sane bounds.
MetricsQuerySerializer.validate()correctly defaults the date range, enforcesstart_date <= end_date, and caps queries at 90 days, while the nested series/summary serializers match the shapes constructed inDashboardMetricsViewSet. TheEventMetricsHourlySerializeris read‑only and exposes the expected fields.backend/dashboard_metrics/cache.py (2)
71-145: Caching decorator behavior and keying strategy look solid.The decorator scopes cache entries by organization, endpoint name, and a stable hash of sorted query params, and only caches 200 responses. It also cleanly bypasses caching when no organization context is present, which is important for correctness with per‑org data.
148-167: No changes needed – code is correct.
CacheServiceis a class inbackend/utils/cache_service.py, andclear_cacheis a static method of that class. The import and call ininvalidate_metrics_cacheare correct. The implementation will properly invalidate cached metrics for the organization.Likely an incorrect or invalid review comment.
backend/dashboard_metrics/models.py (1)
57-85: ConfirmBaseModeldoesn’t defineid, and thatmetric_count’s default is intentionalTwo small checks:
- If
BaseModelalready defines anidfield, redefiningidhere as a primary key can cause confusion or conflicts. IfBaseModelonly adds timestamps and noid, you’re fine; otherwise, consider centralizing the PK definition in one place.metric_countdefaults to1. That makes sense if every row is always created from at least one event, but if you ever create rows with explicitmetric_countor rely on DB-side aggregation, this default could overcount silently.Please double-check that
BaseModeldoes not defineid, and that you never rely on the defaultmetric_countfor anything other than “one event just created this row”.
| def _bulk_upsert_hourly(aggregations: dict) -> tuple[int, int]: | ||
| """Bulk upsert hourly aggregations. | ||
|
|
||
| Args: | ||
| aggregations: Dict of aggregated metric data keyed by | ||
| (org_id, hour_ts_str, metric_name, project, tag) | ||
|
|
||
| Returns: | ||
| Tuple of (created_count, updated_count) | ||
| """ | ||
| created, updated = 0, 0 | ||
| with transaction.atomic(): | ||
| for key, agg in aggregations.items(): | ||
| org_id, hour_ts_str, metric_name, project, tag = key | ||
| hour_ts = datetime.fromisoformat(hour_ts_str) | ||
|
|
||
| try: | ||
| obj, was_created = ( | ||
| EventMetricsHourly.objects.select_for_update().get_or_create( | ||
| organization_id=org_id, | ||
| timestamp=hour_ts, | ||
| metric_name=metric_name, | ||
| project=project, | ||
| tag=tag, | ||
| defaults={ | ||
| "metric_type": agg["metric_type"], | ||
| "metric_value": agg["value"], | ||
| "metric_count": agg["count"], | ||
| "labels": agg["labels"], | ||
| }, | ||
| ) | ||
| ) | ||
| if was_created: | ||
| created += 1 | ||
| else: | ||
| obj.metric_value = F("metric_value") + agg["value"] | ||
| obj.metric_count = F("metric_count") + agg["count"] | ||
| if agg["labels"]: | ||
| obj.labels = {**obj.labels, **agg["labels"]} | ||
| obj.save( | ||
| update_fields=[ | ||
| "metric_value", | ||
| "metric_count", | ||
| "labels", | ||
| "modified_at", | ||
| ] | ||
| ) | ||
| updated += 1 | ||
| except Exception as e: | ||
| logger.error(f"Error upserting hourly metric {key}: {e}") | ||
|
|
||
| return created, updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DB errors are swallowed inside transactions, defeating Celery autoretry and risking silent data loss
In _bulk_upsert_hourly/_daily/_monthly and in the cleanup tasks:
- Each function wraps DB work in
transaction.atomic()but also catches a broadExceptioninside the block and only logs it. - For the Celery tasks (
process_dashboard_metric_events,cleanup_hourly_metrics,cleanup_daily_metrics) you’ve configuredautoretry_for=(DatabaseError, OperationalError), but those DB exceptions are caught and converted into logs/return dicts, so Celery never sees them and won’t retry. - Catching DB errors inside a single
atomic()block without callingtransaction.set_rollback(True)or re‑raising can also leave the transaction marked for rollback and cause subsequent operations in the same block to fail, leading to partial or fully rolled-back writes while the task still reports success.
This combination can make metrics silently incomplete under DB issues (locks, timeouts, transient failures) and is at odds with the production-readiness goal.
I’d recommend:
-
In the upsert helpers: let
DatabaseError/OperationalErrorbubble up so Celery can apply its retry/backoff policy. If you want per-key resilience, move try/except outside theatomic()block or drastically narrow the except clause and re‑raise DB errors. -
In the cleanup tasks: don’t catch
DatabaseError/OperationalErrorat all (or catch, log withlogger.exception(...), thenraise) soautoretry_foractually takes effect. Keep a separateexcept Exceptionhandler only for truly unexpected non-DB errors if you still want to return a failure dict.
Example pattern for the hourly upsert (same idea applies to daily/monthly and cleanup):
with transaction.atomic():
for key, agg in aggregations.items():
org_id, hour_ts_str, metric_name, project, tag = key
hour_ts = datetime.fromisoformat(hour_ts_str)
obj, was_created = (
EventMetricsHourly.objects.select_for_update().get_or_create(
organization_id=org_id,
timestamp=hour_ts,
metric_name=metric_name,
project=project,
tag=tag,
defaults={
"metric_type": agg["metric_type"],
"metric_value": agg["value"],
"metric_count": agg["count"],
"labels": agg["labels"],
},
)
)
# same update logic, no broad except hereand for cleanup:
try:
deleted_count, _ = EventMetricsHourly.objects.filter(
timestamp__lt=cutoff
).delete()
except (DatabaseError, OperationalError):
logger.exception("Error during hourly cleanup")
raise
except Exception:
logger.exception("Unexpected error during hourly cleanup")
return {"success": False, "retention_days": retention_days}This keeps the resilience you want while letting Celery handle transient DB failures correctly.
Also applies to: 263-315, 317-369, 371-414, 424-458
🧰 Tools
🪛 Ruff (0.14.10)
257-257: Do not catch blind exception: Exception
(BLE001)
258-258: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
| def _get_connection(self) -> Connection: | ||
| """Get or create a connection to the broker with retry logic.""" | ||
| if self._connection is not None and self._connection.connected: | ||
| return self._connection | ||
| return self._get_connection_with_retry() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Connection connected property may return stale state.
The _connection.connected check at line 65 may return True even if the broker has disconnected (e.g., broker restart). Consider wrapping the connection reuse in a try-except that falls back to _get_connection_with_retry() on first publish failure, or use Kombu's ensure_connection() pattern.
🔎 Suggested improvement
def _get_connection(self) -> Connection:
"""Get or create a connection to the broker with retry logic."""
- if self._connection is not None and self._connection.connected:
- return self._connection
- return self._get_connection_with_retry()
+ if self._connection is not None:
+ try:
+ # Verify connection is actually usable
+ self._connection.ensure_connection(max_retries=1)
+ return self._connection
+ except Exception:
+ pass # Fall through to reconnect
+ return self._get_connection_with_retry()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _get_connection(self) -> Connection: | |
| """Get or create a connection to the broker with retry logic.""" | |
| if self._connection is not None and self._connection.connected: | |
| return self._connection | |
| return self._get_connection_with_retry() | |
| def _get_connection(self) -> Connection: | |
| """Get or create a connection to the broker with retry logic.""" | |
| if self._connection is not None: | |
| try: | |
| # Verify connection is actually usable | |
| self._connection.ensure_connection(max_retries=1) | |
| return self._connection | |
| except Exception: | |
| pass # Fall through to reconnect | |
| return self._get_connection_with_retry() |
| def record(self, event: MetricEvent) -> bool: | ||
| """Send a metric event as a Celery task. | ||
|
|
||
| Args: | ||
| event: The MetricEvent to queue | ||
|
|
||
| Returns: | ||
| True if the event was queued successfully, False otherwise | ||
| """ | ||
| try: | ||
| event_data = self._create_event_data(event) | ||
| task_message = self._create_celery_message(event_data) | ||
| conn = self._get_connection() | ||
|
|
||
| with conn.Producer() as producer: | ||
| producer.publish( | ||
| task_message, | ||
| exchange=self.exchange, | ||
| routing_key=self.queue_name, | ||
| declare=[self.queue], | ||
| delivery_mode=2, # Persistent | ||
| content_type="application/json", | ||
| content_encoding="utf-8", | ||
| headers={ | ||
| "task": self.task_name, | ||
| "id": str(uuid.uuid4()), | ||
| }, | ||
| ) | ||
|
|
||
| logger.debug( | ||
| f"Queued metric task: {event.metric_name} for org {event.org_id}" | ||
| ) | ||
| return True | ||
|
|
||
| except KombuError as e: | ||
| logger.error(f"Failed to queue metric {event.metric_name}: {e}") | ||
| return False | ||
| except Exception as e: | ||
| logger.error(f"Unexpected error queuing metric: {e}") | ||
| return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
Kombu Connection thread-safe publishing messages concurrent
💡 Result:
Short answer: Kombu (and the underlying py-amqp transport) is not safe for concurrent publishing from multiple threads using the same Connection/Channel objects. Share a Connection only if you ensure each thread uses its own Channel (or a thread-safe wrapper/pool) or protect access with a lock; better: give each thread its own Connection/Channel or use a connection pool or a thread-safe transport implementation. [1][2][3][4]
Recommendations
- Do not share Channel objects across threads — channels are channel-bound and can interleave AMQP frames if used concurrently. Use one channel per thread. [3][4]
- Prefer one producer Connection/Channel per thread, or use a Connection/Channel pool to reuse resources safely. [1][4]
- If you must share a Connection, serialize all operations on it with a threading.Lock (but this reduces concurrency). [3]
- Consider using a project that provides a thread-safe kombu transport/wrapper (e.g., kombu-pyamqp-threadsafe) if you need shared connections with multithreading. [1]
Sources
- kombu-pyamqp-threadsafe project (thread-safe wrapper for kombu/pyamqp). [1]
- StackOverflow discussion on publishing from consumer/threads with Kombu (issues when mixing consuming/publishing on same thread). [2]
- Kombu/py-amqp mailing-list / discussion about thread-safety (channels bound to threads; beware interleaved writes). [3]
- rabbitpy documentation on multi-threaded use (recommend one channel per thread). [4]
🏁 Script executed:
# Find and examine the _get_connection method and class initialization
cd unstract/core/src/unstract/core/metrics/backends && grep -n "_get_connection\|__init__\|_connection" queue.py | head -20Repository: Zipstack/unstract
Length of output: 828
🏁 Script executed:
cd unstract/core/src/unstract/core/metrics/backends && sed -n '35,110p' queue.pyRepository: Zipstack/unstract
Length of output: 2738
Kombu Connection is not thread-safe—add locking or use connection pooling.
The code caches self._connection as an instance variable without synchronization. Concurrent calls to record() from multiple threads will race on self._connection state (the check on line 65 is not atomic), and Kombu does not support concurrent publishing from the same Connection. For multi-threaded metrics recording, protect the connection with a threading.Lock, use thread-local connections, or implement a connection pool.
🧰 Tools
🪛 Ruff (0.14.10)
149-149: Consider moving this statement to an else block
(TRY300)
152-152: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
154-154: Do not catch blind exception: Exception
(BLE001)
155-155: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
| # Mapping of metric names to their types | ||
| METRIC_TYPE_MAP: dict[str, MetricType] = { | ||
| # Counters | ||
| MetricName.DOCUMENTS_PROCESSED.value: MetricType.COUNTER, | ||
| MetricName.PAGES_PROCESSED.value: MetricType.COUNTER, | ||
| MetricName.PROMPT_EXECUTIONS.value: MetricType.COUNTER, | ||
| MetricName.LLM_CALLS.value: MetricType.COUNTER, | ||
| MetricName.CHALLENGES.value: MetricType.COUNTER, | ||
| MetricName.SUMMARIZATION_CALLS.value: MetricType.COUNTER, | ||
| # Histograms | ||
| MetricName.DEPLOYED_API_REQUESTS.value: MetricType.HISTOGRAM, | ||
| MetricName.ETL_PIPELINE_EXECUTIONS.value: MetricType.HISTOGRAM, | ||
| MetricName.LLM_USAGE.value: MetricType.HISTOGRAM, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if tests exist to verify METRIC_TYPE_MAP completeness
rg -n 'METRIC_TYPE_MAP' --type=py -g '**/test*'Repository: Zipstack/unstract
Length of output: 43
🏁 Script executed:
#!/bin/bash
# Find MetricName enum definition
rg -n 'class MetricName|enum MetricName' --type=py -A 20Repository: Zipstack/unstract
Length of output: 1805
🏁 Script executed:
#!/bin/bash
# Find get_metric_type() implementation
rg -n 'def get_metric_type' --type=py -A 10Repository: Zipstack/unstract
Length of output: 933
🏁 Script executed:
#!/bin/bash
# Check the full types.py file to understand the structure
cat -n unstract/core/src/unstract/core/metrics/types.pyRepository: Zipstack/unstract
Length of output: 3933
🏁 Script executed:
#!/bin/bash
# Get the full get_metric_type() implementation
cat -n unstract/core/src/unstract/core/metrics/registry.py | head -50Repository: Zipstack/unstract
Length of output: 1566
🏁 Script executed:
#!/bin/bash
# Search for test files in metrics module
fd '.*test.*\.py$' --type f | grep -i metricRepository: Zipstack/unstract
Length of output: 106
🏁 Script executed:
#!/bin/bash
# Search for any validation or assertions related to METRIC_TYPE_MAP
rg -n 'assert.*METRIC_TYPE_MAP|METRIC_TYPE_MAP.*in|in.*METRIC_TYPE_MAP' --type=pyRepository: Zipstack/unstract
Length of output: 200
METRIC_TYPE_MAP must stay in sync with MetricName.
While validate_metric() checks if a metric exists in the enum, it does not ensure the metric is in METRIC_TYPE_MAP. If a new metric is added to MetricName without updating METRIC_TYPE_MAP, get_metric_type() will raise a KeyError when accessing the map on line 39 of registry.py. Add a unit test to verify all MetricName values have corresponding entries in METRIC_TYPE_MAP.
🤖 Prompt for AI Agents
In @unstract/core/src/unstract/core/metrics/types.py around lines 35-48, Add a
unit test that ensures every value of the MetricName enum has a corresponding
key in METRIC_TYPE_MAP to prevent KeyError in get_metric_type(); specifically,
iterate over MetricName (or MetricName.__members__.values()/MetricName) and
assert that each .value is present in METRIC_TYPE_MAP, referencing
METRIC_TYPE_MAP and MetricName; this complements validate_metric() and prevents
missing entries that would cause get_metric_type() to fail.
chandrasekharan-zipstack
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes mostly LGTM. Left some minor comments.
One main question I have is around cache invalidation of the cached, aggregated data - does it happen only on the respective TTLs? I hope possibly receiving stale data for that duration is okay in this usecase
| worker_prefetch_multiplier = 4 | ||
| worker_max_tasks_per_child = 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@athul-rs any reason why this or other config was added? I hope it doesn't have any side effects on other workers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@athul-rs do these tests run on every PR as part of the existing unit tests? If not, please ensure that its integrated there as well otherwise your effort on these tests won't be realised
backend/dashboard_metrics/cache.py
Outdated
| CACHE_TTL_CURRENT_HOUR = 30 # 30 seconds for current hour (updating frequently) | ||
| CACHE_TTL_HISTORICAL = 8 * 60 * 60 # 8 hours for historical data (stable) | ||
|
|
||
| # Legacy TTLs for compatibility with existing endpoints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@athul-rs what existing endpoints do we have that require such legacy TTLs? Check and correct this comment
backend/dashboard_metrics/cache.py
Outdated
|
|
||
| # Cache TTL values (in seconds) | ||
| # Time-aware TTLs per documentation | ||
| CACHE_TTL_CURRENT_HOUR = 30 # 30 seconds for current hour (updating frequently) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: @athul-rs I would suggest defining these values in settings/base.py so that they can be configured with an env too optionally with some sane defaults. Make sure the names are updated / prefixed with DASHBOARD_ to group them separately
| """ | ||
| # Sort params for consistent hashing | ||
| sorted_params = json.dumps(params, sort_keys=True, default=str) | ||
| params_hash = hashlib.md5(sorted_params.encode()).hexdigest()[:12] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: Possibility for cache collision since we only take first 12 digits, but should suffice mostly since we separate for endpoints. Do check if a change is needed here
| except KeyError as e: | ||
| logger.warning(f"Skipping event with missing required field: {e}") | ||
| errors += 1 | ||
| except Exception as e: | ||
| logger.warning(f"Error processing event: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: Consider adding a stack trace
| organization = UserContext.get_organization() | ||
| if not organization: | ||
| logger.warning("No organization context for metrics request") | ||
| raise PermissionDenied("No organization context") | ||
| return EventMetricsHourly.objects.filter(organization=organization) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@athul-rs check if this is required. This would be handled by the model manager itself usually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: This is a cool idea and if refactored a bit to make it generic, we can easily extend the same to any other API / DB call that we wish to cache. Consider this suggestion if its easy to make it generic
| from ..types import MetricEvent | ||
| from .base import AbstractMetricBackend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: Consider absolute imports instead
| return self._connection | ||
| return self._get_connection_with_retry() | ||
|
|
||
| def _get_connection_with_retry( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@athul-rs why do we need to implement this? Isn't it available / handled within celery / kombu?
Test ResultsSummary
Runner Tests - Full Report
SDK1 Tests - Full Report
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
🤖 Fix all issues with AI agents
In @backend/dashboard_metrics/management/commands/backfill_metrics.py:
- Around line 269-298: The outer transaction.atomic() in _upsert_hourly wraps
all update_or_create calls so a DB error (e.g., IntegrityError) can leave the
transaction unusable and abort the rest of the loop; remove the outer
transaction.atomic() (or alternatively use a per-row savepoint) so each
EventMetricsHourly._base_manager.update_or_create call runs in its own atomic
context and the loop can continue on exceptions; also change the unused returned
variable name obj to _ and apply the same change (remove outer atomic or use
per-row savepoints and rename obj → _) in the corresponding daily/monthly upsert
methods that use update_or_create.
In @backend/dashboard_metrics/management/commands/generate_metrics_test_data.py:
- Around line 235-283: The re-query after bulk_create in
_generate_workflow_executions is unreliable and may misalign _target_created_at
values; instead use the list returned by
WorkflowExecution.objects.bulk_create(executions) (these objects have PKs
populated) and iterate that returned list to update each record's created_at via
individual QuerySet.update(id=exec_obj.id) using exec_obj._target_created_at (do
not attempt bulk_update because created_at uses auto_now_add).
In @backend/dashboard_metrics/tasks.py:
- Around line 222-262: The current broad except in the upsert loop swallows
IntegrityError from concurrent inserts causing lost increments; import
django.db.IntegrityError and explicitly catch IntegrityError around the
get_or_create in EventMetricsHourly upsert logic (the block that calls
select_for_update().get_or_create and currently sets obj, was_created), then on
IntegrityError re-query the row (EventMetricsHourly.objects.get(...) for the
same organization_id, timestamp, metric_name, project, tag), set
was_created=False and proceed to update counters and labels as in the existing
else-path; keep the generic Exception handler for other errors but ensure you
continue after logging; apply the same explicit IntegrityError catch-and-retry
pattern to the _bulk_upsert_daily and _bulk_upsert_monthly blocks as well.
In @unstract/core/src/unstract/core/metrics/backends/queue.py:
- Around line 35-62: The QueueBackend __init__ creates self.exchange =
Exchange("", type="direct") which uses the AMQP default exchange and doesn't
match the implicit queue-named exchange used by Celery; change the exchange to a
named direct exchange that matches the queue name (e.g., use
Exchange(self.queue_name, type="direct") or Exchange("dashboard_metric_events",
type="direct") when constructing self.exchange) so published messages route to
the KombuQueue (self.queue) consumers.
🟠 Major comments (19)
backend/dashboard_metrics/management/commands/populate_test_metrics.py-22-33 (1)
22-33: Metric types appear inconsistent with documentation.The
METRICSlist defines:
pages_processedasCOUNTER(line 25)deployed_api_requestsasHISTOGRAM(line 30)etl_pipeline_executionsasHISTOGRAM(line 31)However, the README documentation (lines 173-179) indicates:
pages_processedshould beHISTOGRAM(sums page counts)deployed_api_requestsandetl_pipeline_executionsshould beCOUNTER(counting events)This mismatch could cause incorrect test data generation and potentially mask issues during testing.
🐛 Proposed fix
METRICS = [ ("documents_processed", MetricType.COUNTER), - ("pages_processed", MetricType.COUNTER), + ("pages_processed", MetricType.HISTOGRAM), ("prompt_executions", MetricType.COUNTER), ("llm_calls", MetricType.COUNTER), ("challenges", MetricType.COUNTER), ("summarization_calls", MetricType.COUNTER), - ("deployed_api_requests", MetricType.HISTOGRAM), - ("etl_pipeline_executions", MetricType.HISTOGRAM), + ("deployed_api_requests", MetricType.COUNTER), + ("etl_pipeline_executions", MetricType.COUNTER), ("llm_usage", MetricType.HISTOGRAM), ]frontend/src/hooks/useMetricsData.js-21-75 (1)
21-75: Fix hook dependency handling: include callbacks in effect dependency arrays instead of using eslint-disable-line.The
eslint-disable-linecomments hide a real issue:useMetricsOverview,useMetricsSummary, anduseMetricsSeriesall omit their fetch callbacks from the effect dependencies (lines 141, 227, 313), even though these callbacks are in the useCallback dependencies. IfaxiosPrivatechanges (e.g., during auth token refresh), the callbacks are recreated, but the effects won't re-run—so they'll continue calling stale callbacks with outdatedaxiosPrivateinstances.Additionally,
useMetricsDataunnecessarily depends on the entireoptionsobject instead of destructuring individual properties, causing callback recreation whenever a newoptionsobject is passed, even if its values haven't changed.Proposed fixes:
For
useMetricsData(lines 21-75), destructureoptionsand depend on individual properties:
- Add at the top:
const { startDate, endDate, granularity, metricName } = options;- Change useCallback deps from
[axiosPrivate, options, orgId]to[axiosPrivate, orgId, startDate, endDate, granularity, metricName]- Update references from
options.startDate→startDateetc.For the three other hooks (lines 137-145, 223-231, 309-317), include the fetch callback in effect dependencies:
useMetricsOverview:}, [fetchOverview]);(removes the guard; callback has its own internal guard)useMetricsSummary:}, [fetchSummary]);useMetricsSeries:}, [fetchSeries]);backend/dashboard_metrics/migrations/0001_initial.py-17-136 (1)
17-136: Makeorganizationnon-nullable in all three metrics tables.PostgreSQL UNIQUE constraints allow multiple NULL values (NULL ≠ NULL in SQL), so with
organization=null, the constraint cannot prevent duplicate rows for the same (bucket, metric_name, project, tag). All actual metric creation explicitly passesorganization_id(never None), and the manager filters exclude NULL organizations. Since this is a multi-tenant application and metrics are always org-scoped,organizationshould benull=Falsewith nodefault=Noneto ensure constraint integrity and prevent orphaned records.Also applies to: 137–258 (EventMetricsDaily), 259–380 (EventMetricsMonthly)
backend/backend/celery_config.py-36-49 (1)
36-49: Fix invalid socket options in broker_transport_options — socket_timeout and socket_keepalive are not recognized by Kombu/py-amqp.In Celery 5.x with RabbitMQ (py-amqp), the socket timeout and keepalive options in
broker_transport_optionsare misconfigured:
socket_timeoutis not a valid transport option; useread_timeoutandwrite_timeoutinstead.socket_keepaliveis not a valid transport option; py-amqp enables SO_KEEPALIVE by default. If you need to tune keepalive behavior, use thesocket_settingsdict with platform-specific constants.These invalid keys will be silently ignored, leaving sockets without timeout protection. The other configuration keys (
broker_connection_retry_on_startup,broker_connection_retry,broker_connection_max_retries, andtask_routes) are correctly formatted for Celery 5.x.Recommended fix
Replace the current
broker_transport_optionswith:broker_transport_options = { "max_retries": 3, "interval_start": 0, "interval_step": 0.2, "interval_max": 1, "read_timeout": 30, "write_timeout": 30, }Or, if you need platform-specific keepalive tuning:
from amqp.platform import TCP_USER_TIMEOUT broker_transport_options = { "max_retries": 3, "interval_start": 0, "interval_step": 0.2, "interval_max": 1, "read_timeout": 30, "write_timeout": 30, "socket_settings": { TCP_USER_TIMEOUT: 60000, # ms }, }frontend/src/components/metrics-dashboard/MetricsDashboard.jsx-55-95 (1)
55-95: Normalize date range boundaries to prevent timezone-induced off-by-one-day queries.Converting local dates directly to UTC ISO strings can shift day boundaries by hours. When a user picks "Jan 30" in EST (UTC-5),
.toISOString()produces2024-01-30T05:00:00Z, causing the query to start mid-day and miss early morning metrics. For hourly aggregations this causes timestamp misalignment; for daily/monthly it silently excludes or includes unintended days.Explicitly set
.startOf("day")and.endOf("day")before converting to ISO to ensure the backend receives the full day range in UTC:Proposed fix (normalize dates before sending to hooks)
@@ const [dateRange, setDateRange] = useState([ - dayjs().subtract(30, "day"), - dayjs(), + dayjs().subtract(30, "day").startOf("day"), + dayjs().endOf("day"), ]); @@ - } = useMetricsOverview( - dateRange[0]?.toISOString(), - dateRange[1]?.toISOString() - ); + const startIso = dateRange[0]?.startOf("day").toISOString(); + const endIso = dateRange[1]?.endOf("day").toISOString(); + + } = useMetricsOverview(startIso, endIso); @@ } = useMetricsSummary( - shouldFetchSummary ? dateRange[0]?.toISOString() : null, - shouldFetchSummary ? dateRange[1]?.toISOString() : null, + shouldFetchSummary ? startIso : null, + shouldFetchSummary ? endIso : null, null, // metricName dataSource ); @@ const handleDateChange = useCallback((dates) => { if (dates && dates.length === 2) { - setDateRange(dates); + setDateRange([dates[0].startOf("day"), dates[1].endOf("day")]); } }, []);backend/dashboard_metrics/tasks.py-526-716 (1)
526-716:aggregate_metrics_from_sources()writesmetric_countas “number of buckets”, not “number of events” (breaks averages/derived stats).
You docount += 1per returned period row (Lines 608-609 / 633-634 / 649-652), sometric_countbecomes “hours with data” / “days with data”, not the underlying event count. This will conflict with event-based ingestion wheremetric_countis the number of recorded events.At minimum, define/rename the semantic (bucket_count vs event_count) or adjust
MetricsQueryServicequeries to also return an event count (e.g.,Count("id")) sometric_countis consistent.backend/dashboard_metrics/views.py-786-797 (1)
786-797:live_serieserror entries don’t match the success schema (frontend may crash on missing keys).
Success entries includemetric_type,data,total_value; error entries omitmetric_typeand also omitproject/tag/total_countfields that the aggregated endpoints provide.Proposed fix (keep shape stable)
series.append( { "metric_name": metric_name, + "metric_type": "histogram" + if metric_name == "llm_usage" + else "counter", "error": "unavailable", "data": [], "total_value": 0, + "total_count": 0, } )backend/dashboard_metrics/views.py-277-310 (1)
277-310:average_valueshould usetotal_count(events), notlen(values)(buckets).
metric_valueis aggregated andmetric_countis the number of events; dividing by bucket count will skew averages (especially if there are sparse buckets).Proposed fix
summary_list.append( { @@ - "average_value": agg["total_value"] / len(values) - if values - else 0, + "average_value": ( + agg["total_value"] / agg["total_count"] + if agg["total_count"] + else 0 + ), "min_value": min(values) if values else 0, "max_value": max(values) if values else 0, } )backend/dashboard_metrics/views.py-129-152 (1)
129-152: Monthly filtering is wrong whenstart_dateis mid-month (you’ll drop the starting month).
monthrows are typically stored as the first day of the month; filteringmonth__gte=start_date.date()will exclude that month unless start_date is on day 1.Proposed fix
queryset = queryset.filter( **{ - f"{ts_field}__gte": params["start_date"] - if source == "hourly" - else params["start_date"].date(), + f"{ts_field}__gte": ( + params["start_date"] + if source == "hourly" + else ( + params["start_date"].date().replace(day=1) + if source == "monthly" + else params["start_date"].date() + ) + ), f"{ts_field}__lte": params["end_date"] if source == "hourly" else params["end_date"].date(), } )backend/dashboard_metrics/services.py-1-35 (1)
1-35: Docstring says “Uses _base_manager for models…”, but onlyUsageis bypassed (Celery aggregation will likely return zeros).
APIDeployment,Pipeline, and oftenWorkflowExecution/PageUsagealso use organization-aware managers; in Celery contextUserContextis absent, so.objectsmay auto-filter to empty.#!/bin/bash # Verify which models use DefaultOrganizationManagerMixin-like filtering. rg -n --type=py "DefaultOrganizationManagerMixin|DefaultOrganizationMixin|UserContext\.get_organization\(" backend | head -n 200 rg -n --type=py "objects\s*=\s*.*ModelManager|objects\s*=" backend/api_v2/models.py backend/pipeline_v2/models.py backend/workflow_manager/workflow_v2/models/execution.py backend/account_usage/models.py 2>/dev/null || truebackend/dashboard_metrics/tasks.py-79-209 (1)
79-209: Normalizemetric_type(string vs enum) and protect against non-numericmetric_value.
QueueBackendsendsmetric_typeas a string; here default usesMetricType.COUNTER(enum). Mixed types can leak into DB writes.Proposed fix
- metric_value = event.get("metric_value", 1) - metric_type = event.get("metric_type", MetricType.COUNTER) + metric_value = event.get("metric_value", 1) + try: + metric_value = float(metric_value) + except (TypeError, ValueError): + raise ValueError(f"Invalid metric_value: {metric_value!r}") + + metric_type_raw = event.get("metric_type", MetricType.COUNTER) + try: + metric_type = MetricType(metric_type_raw) + except Exception: + metric_type = MetricType.COUNTERbackend/dashboard_metrics/services.py-67-399 (1)
67-399: Use a consistent “unfiltered manager” helper for all source models used by Celery tasks.
Otherwiseaggregate_metrics_from_sources()will compute incorrect results for at leastdeployed_api_requests/etl_pipeline_executions(and possibly others).Suggested pattern
+def _all(model): + # Prefer _base_manager to bypass org-context managers in Celery. + return getattr(model, "_base_manager", model.objects).all() @@ - WorkflowFileExecution.objects.filter( + _all(WorkflowFileExecution).filter( @@ - PageUsage.objects.filter( + _all(PageUsage).filter( @@ - api_subquery = APIDeployment.objects.filter( + api_subquery = _all(APIDeployment).filter( organization_id=organization_id ).values("id") @@ - WorkflowExecution.objects.filter( + _all(WorkflowExecution).filter( @@ - pipeline_subquery = Pipeline.objects.filter( + pipeline_subquery = _all(Pipeline).filter( organization_id=organization_id, pipeline_type="ETL", ).values("id") @@ - WorkflowExecution.objects.filter( + _all(WorkflowExecution).filter( @@ - WorkflowExecution.objects.filter( + _all(WorkflowExecution).filter(backend/dashboard_metrics/management/commands/backfill_metrics.py-278-280 (1)
278-280: Be consistent with_base_managervs.objectswhen bypassing org-context filters.
Hourly uses_base_manager, daily/monthly use.objects. If these models have a default manager that applies org scoping (as noted in PR context), daily/monthly backfill could be incomplete.Proposed fix
- obj, was_created = EventMetricsDaily.objects.update_or_create( + _, was_created = EventMetricsDaily._base_manager.update_or_create( @@ - obj, was_created = EventMetricsMonthly.objects.update_or_create( + _, was_created = EventMetricsMonthly._base_manager.update_or_create(Also applies to: 309-311, 340-343
backend/dashboard_metrics/cache.py-163-179 (1)
163-179: Cache invalidation does not clear bucket caches for the org.
invalidate_metrics_cache()deletesdashboard_metrics:{org_id}:*, but bucket keys aredashboard_metrics:bucket:<granularity>:<org_id>:..., so they survive invalidation.Proposed fix (clear both namespaces)
def invalidate_metrics_cache(org_id: str) -> int: @@ - pattern = f"{CACHE_PREFIX}:{org_id}:*" + patterns = [ + f"{CACHE_PREFIX}:{org_id}:*", + f"{CACHE_PREFIX}:bucket:*:{org_id}:*", + ] try: - CacheService.clear_cache(pattern) + for pattern in patterns: + CacheService.clear_cache(pattern) logger.info(f"Invalidated metrics cache for org {org_id}") return 1 # Pattern-based delete doesn't return countAlso applies to: 333-364
backend/dashboard_metrics/capture.py-79-82 (1)
79-82: Don’t mutate caller-providedlabelsdict.
metric_labels = labels or {}+ adding keys mutates the caller’s dict whenlabelsis passed in.Proposed fix
- metric_labels = labels or {} + metric_labels = {**(labels or {})}Apply similarly in all methods that enrich labels.
Also applies to: 116-119, 155-158, 208-213, 260-263, 328-331
backend/dashboard_metrics/capture.py-33-35 (1)
33-35: Avoid import-time settings capture forMETRICS_ENABLED(stale toggle in tests/runtime).
METRICS_ENABLEDis fixed at import time, so changingsettings.DASHBOARD_METRICS_ENABLEDduring tests or runtime won’t take effect.Proposed fix
-# Flag to control whether metrics are captured (can be disabled for testing) -METRICS_ENABLED = getattr(settings, "DASHBOARD_METRICS_ENABLED", True) +def _metrics_enabled() -> bool: + # Read at call time so tests/runtime overrides work. + return getattr(settings, "DASHBOARD_METRICS_ENABLED", True)Then replace
if not METRICS_ENABLED:withif not _metrics_enabled():in all methods.backend/dashboard_metrics/management/commands/backfill_metrics.py-35-49 (1)
35-49: UseClassVar+ ensuremetric_namevalues match the rest of the metrics system.
Two concerns:
METRIC_CONFIGSis a mutable class attribute (RUF012).- Using raw strings (
"documents_processed", etc.) risks drift vs canonicalMetricNameidentifiers used by capture/queue processing.Proposed fix
+from typing import ClassVar @@ - METRIC_CONFIGS = [ + METRIC_CONFIGS: ClassVar[list[tuple[str, Any, bool]]] = [ ("documents_processed", MetricsQueryService.get_documents_processed, False),Separately, consider importing the canonical metric name enum/constants and using them here (if available).
backend/dashboard_metrics/management/commands/backfill_metrics.py-58-63 (1)
58-63:--org-iduses wrong field: should queryorganization_id, notid.The help text says "organization ID" and the parameter accepts a string (line 60), but the code queries the integer
idfield (line 105). Throughout the codebase, the standard pattern is to useOrganization.objects.get(organization_id=org_id)for string identifiers. If a caller passes the organization's string identifier (theorganization_idfield), this will silently fail withOrganization.DoesNotExist.Change line 105 to:
orgs = [Organization.objects.get(organization_id=org_id)]This aligns with the consistent pattern used in
internal_api_views.py,user_context.py, and the rate limit management commands.backend/dashboard_metrics/management/commands/backfill_metrics.py-194-257 (1)
194-257:metric_countcurrently counts periods (rows returned), not events or samples, which breaks histogram semantics and is meaningless for counters.Given that
MetricsQueryService.*returns pre-aggregated data (one row per period with value already aggregated),count += 1makesmetric_countrepresent the number of time periods rather than the number of events or samples. For example:
documents_processed: Returns 1 row per hour withvalue=Count(id)of 100 documents, butmetric_count=1(the row count)pages_processed(histogram): Returns 1 row per hour withvalue=Sum(pages_processed)of 500 pages from 10 source records, butmetric_count=1instead of 10To fix this, query methods need to return sample/event count information, or the backfill code should be restructured to count source records directly. For counters,
metric_countcould equalvalue(event count already in the value); for histograms, it should reflect the actual number of aggregated records.
🟡 Minor comments (13)
backend/dashboard_metrics/README.md-729-733 (1)
729-733: Migration file names may be out of sync.The file structure section lists migrations as
0002_setup_cleanup_tasks.pyand0003_setup_aggregation_task.py, but the PR description indicates different migration names (0002_setup_cleanup_periodic_task,0003_add_daily_monthly_tables,0004_update_cleanup_periodic_tasks). Please update the documentation to match the actual migration filenames.backend/dashboard_metrics/management/commands/generate_metrics_test_data.py-397-425 (1)
397-425:PageUsage.created_atbackfill ignores the originally chosen timestamps.
You computecreated_atper record, but then overwrite with a new random timestamp during the update loop. Store_target_created_atlike other generators and update from that.frontend/src/hooks/useMetricsData.js-77-84 (1)
77-84: Docs don’t match behavior for default date range.
JSDoc says overview defaults to “7 days ago → now”, but the hook doesn’t implement those defaults (it passesnullthrough). Either implement defaults or fix the docs to avoid caller confusion.Also applies to: 85-92
frontend/src/components/metrics-dashboard/MetricsChart.jsx-74-87 (1)
74-87: Handle “no metrics selected” whenallowClearis used.
WithallowClear,selectedMetricscan become[], and the chart renders with no series (looks broken). Consider either preventing empty selection, auto-falling back to the first available metric, or rendering an<Empty />state whenmetricsToShow.length === 0.Also applies to: 118-258
frontend/src/components/metrics-dashboard/MetricsTable.jsx-64-77 (1)
64-77: Handlenullvalues in min/max render functions.The check
value !== undefinedpasses fornull, butnull.toFixed(2)throws a TypeError. The API may returnnullfor metrics without min/max values.🐛 Proposed fix
{ title: "Min", dataIndex: "min_value", key: "min_value", - render: (value) => (value !== undefined ? value.toFixed(2) : "-"), + render: (value) => (value != null ? value.toFixed(2) : "-"), align: "right", }, { title: "Max", dataIndex: "max_value", key: "max_value", - render: (value) => (value !== undefined ? value.toFixed(2) : "-"), + render: (value) => (value != null ? value.toFixed(2) : "-"), align: "right", },frontend/src/components/metrics-dashboard/MetricsSummary.jsx-13-118 (1)
13-118: Potential React key collision if totals aren’t unique bymetric_name.
If the API ever returns multiple rows for the samemetric_name(e.g., perproject/tag), cards will overwrite each other due tokey={metric.metric_name}. Consider keying by a stable composite (or fallback to index).unstract/core/src/unstract/core/metrics/__init__.py-94-101 (1)
94-101: Use parameterized logging for the unknown-metric warning.
Avoid f-strings in logger calls.Proposed fix
@@ if not validate_metric(metric_name): - logger.warning(f"Unknown metric: {metric_name}. Skipping.") + logger.warning("Unknown metric: %s. Skipping.", metric_name) return Falsebackend/backend/celery_config.py-34-74 (1)
34-74: AddClassVarannotations for mutable CeleryConfig attributes (Ruff RUF012).
This avoids accidental instance-level mutation patterns and satisfies the linter.Proposed fix
@@ from urllib.parse import quote_plus from django.conf import settings from kombu import Queue +from typing import ClassVar, Any class CeleryConfig: @@ - broker_transport_options = { + broker_transport_options: ClassVar[dict[str, Any]] = { "max_retries": 3, "interval_start": 0, "interval_step": 0.2, "interval_max": 1, "socket_timeout": 30, "socket_keepalive": True, } @@ - task_queues = [ + task_queues: ClassVar[list[Queue]] = [ Queue("celery", routing_key="celery"), Queue("dashboard_metric_events", routing_key="dashboard_metric_events"), Queue("celery_periodic_logs", routing_key="celery_periodic_logs"), Queue("celery_log_task_queue", routing_key="celery_log_task_queue"), Queue("celery_api_deployments", routing_key="celery_api_deployments"), Queue("file_processing", routing_key="file_processing"), Queue("api_file_processing", routing_key="api_file_processing"), Queue("file_processing_callback", routing_key="file_processing_callback"), Queue("api_file_processing_callback", routing_key="api_file_processing_callback"), ] @@ - task_routes = { + task_routes: ClassVar[dict[str, dict[str, str]]] = { "dashboard_metrics.process_events": {"queue": "dashboard_metric_events"}, "dashboard_metrics.aggregate_from_sources": {"queue": "dashboard_metric_events"}, "dashboard_metrics.cleanup_hourly_data": {"queue": "dashboard_metric_events"}, "dashboard_metrics.cleanup_daily_data": {"queue": "dashboard_metric_events"}, }backend/dashboard_metrics/tasks.py-553-567 (1)
553-567:pages_processedis marked histogram, but it’s a counter-like metric (sum of pages).
Unless you intentionally treat it as histogram in UI/rollups, set it toFalsehere.Proposed fix
- ("pages_processed", MetricsQueryService.get_pages_processed, True), + ("pages_processed", MetricsQueryService.get_pages_processed, False),backend/dashboard_metrics/cache.py-185-196 (1)
185-196: Type mismatch:set_cached_overviewexpects dict butwarm_cache()passes a list.
Either change the annotation tolist[dict[str, Any]]or change what’s stored.Proposed fix (make it a list everywhere)
class MetricsCacheService: @@ - def get_cached_overview(org_id: str) -> dict[str, Any] | None: + def get_cached_overview(org_id: str) -> list[dict[str, Any]] | None: @@ - def set_cached_overview(org_id: str, data: dict[str, Any]) -> None: + def set_cached_overview(org_id: str, data: list[dict[str, Any]]) -> None:Also applies to: 231-256
backend/dashboard_metrics/capture.py-215-233 (1)
215-233: Return value should reflect bothLLM_CALLSandLLM_USAGEwrites (partial success hidden).
Ifcost > 0and usage write fails, the method still returnscall_result.Proposed fix
- if cost > 0: - record( + usage_result = True + if cost > 0: + usage_result = record( org_id=org_id, metric_name=MetricName.LLM_USAGE, metric_value=cost, labels=metric_labels, project=project, ) - - return call_result + return call_result and usage_resultbackend/dashboard_metrics/capture.py-159-177 (1)
159-177: Validatepagesto avoid negative/zero page metrics.
As-is,pagescan be 0/negative and will write incorrect histogram/counter values.Proposed fix
def record_document_processed( org_id: str, pages: int = 1, @@ if record is None: return False + if pages < 0: + return Falsebackend/dashboard_metrics/cache.py-142-157 (1)
142-157: CachedResponsefrom serialized data is fragile for future customization.The current pattern caches
response.data(a dict) and reconstructs it asResponse(cached)on cache hits. This works fine for plain JSON endpoints, and the four decorated endpoints here (overview, live_summary, live_series, live-summary) currently set no custom headers or renderers, so there's no immediate impact.However, this pattern is brittle: if custom headers or non-default renderers are added to a cached endpoint in the future, they would silently be lost on cache hits without warning. Consider documenting that this decorator is only suitable for endpoints that return unadorned JSON, or cache/return the full response object with headers intact.
🧹 Nitpick comments (16)
backend/dashboard_metrics/README.md (1)
15-23: Consider adding language hints to code blocks.Several fenced code blocks in the documentation (ASCII diagrams, data flow, file structure) lack language specifiers. While these are primarily visual diagrams, adding
textorplaintextas the language identifier would satisfy markdown linters and improve consistency.frontend/src/helpers/metricsCache.js (1)
24-30: Consider handling complex parameter values.The
buildCacheKeyfunction converts params to strings using template literals, which would produce[object Object]for nested objects or inconsistent results for arrays. If the metrics API ever uses complex query parameters, this could cause cache key collisions or misses.♻️ Suggested improvement
function buildCacheKey(endpoint, params) { const sortedKeys = Object.keys(params).sort(); - const paramStr = sortedKeys.map((k) => `${k}=${params[k]}`).join("&"); + const paramStr = sortedKeys + .map((k) => `${k}=${JSON.stringify(params[k])}`) + .join("&"); // Use simple encoding to avoid btoa issues with unicode const hash = paramStr ? `_${encodeURIComponent(paramStr)}` : ""; return `${CACHE_PREFIX}${endpoint}${hash}`; }backend/dashboard_metrics/management/commands/populate_test_metrics.py (1)
60-67: Consider chaining exceptions for better debugging.When re-raising exceptions, chaining them preserves the original traceback which aids debugging.
♻️ Suggested improvement
if org_id: try: org = Organization.objects.get(id=org_id) except Organization.DoesNotExist: - raise CommandError(f"Organization with ID {org_id} not found") + raise CommandError(f"Organization with ID {org_id} not found") from None else: org = Organization.objects.first() if not org: raise CommandError("No organizations found. Create one first.")frontend/src/components/metrics-dashboard/MetricsChart.jsx (1)
266-292: StabilizeMultiSeriesChartmetric order for consistent colors/legend.
metricNamescomes fromObject.entries(data.series)iteration order; sorting will keep line colors stable across runs and payload shapes.Proposed diff
const { chartData, metricNames } = useMemo(() => { if (!data?.series) return { chartData: [], metricNames: [] }; const dateMap = new Map(); - const names = []; + const names = []; Object.entries(data.series).forEach(([metricName, values]) => { names.push(metricName); values.forEach(({ date, value }) => { if (!dateMap.has(date)) { dateMap.set(date, { date }); } dateMap.get(date)[metricName] = value; }); }); const sorted = Array.from(dateMap.values()).sort( (a, b) => new Date(a.date) - new Date(b.date) ); - return { chartData: sorted, metricNames: names }; + return { chartData: sorted, metricNames: names.sort() }; }, [data]);Also applies to: 341-353
backend/dashboard_metrics/models.py (1)
63-105: Consider makingmetric_countnon-negative.
metric_countshould never be negative;PositiveIntegerField(orPositiveBigIntegerFieldif large) better encodes the invariant.Also applies to: 169-211, 275-317
unstract/core/src/unstract/core/metrics/__init__.py (1)
39-53: Ruff RUF022:__all__ordering.
Not functionally important, but easy to clean up if you want to keep lint noise down.unstract/core/src/unstract/core/metrics/backends/queue.py (2)
69-115: Retry/backoff here can stall request threads for several seconds when RabbitMQ is down.
Ifrecord()is called in synchronous web paths, the exponential sleeps (Line 111) can become user-facing latency. Consider making retry configurable per environment (or “fail fast” by default), and/or using Kombu’s publish retry policies instead of reconnect sleeps.
90-97: Swallowing connection-close errors hides useful diagnostics.
At least log at debug level so broker flaps are diagnosable.backend/dashboard_metrics/views.py (4)
165-232: Bucket cache DB fallback is N-queries-per-missing-hour (cold cache over 7 days can be ~168 queries).
This will be expensive under load. Prefer one ranged query for the missing span and then bucket in Python.
391-401: Remove the innerfrom datetime import datetime(it’s redundant and easy to shadow).
You already importdatetimeat the top (Line 5); the inner import (Line 391) is unnecessary.
486-523: Monthly path usesTruncDay(month)and then returns “daily” shaped points; clarify granularity semantics.
If the UI requestsgranularity=week/daybut source is monthly, you’ll still return one point per month-start day. If that’s intended, OK—otherwise consider forcinggranularitytomonthwhensource=monthly(and exposingmonthin the serializer choices).
868-878: Cache health check key is global; use a namespaced/random key to avoid cross-worker interference.
Multiple instances probing the same key can produce false negatives.backend/dashboard_metrics/services.py (1)
101-134: Sums can return NULL; considerCoalesce(..., 0)to avoid downstreamNonechecks.
You already guard inget_all_metrics_summaryfor some metrics; pushing it into the query keeps call sites simpler.Also applies to: 330-364
backend/dashboard_metrics/tasks.py (1)
577-580: Iterate orgs with.iterator()to avoid loading all org IDs into memory.
Especially if this runs every 15 minutes.backend/dashboard_metrics/serializers.py (1)
11-72: Validation is solid; consider making naive datetimes explicitly timezone-aware.
Depending on DRF settings,DateTimeFieldmay accept naive values; the views mix both patterns (overview makes aware). Centralizing this in the serializer avoids surprises.backend/dashboard_metrics/cache.py (1)
101-105: Prefer SHA-256 over MD5 for cache key hashing (FIPS/security posture).
MD5 isn’t a security risk here, but it is often disallowed in hardened environments.Proposed fix
- params_hash = hashlib.md5(sorted_params.encode()).hexdigest()[:12] + params_hash = hashlib.sha256(sorted_params.encode()).hexdigest()[:12]
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to Reviews > Disable Cache setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
⛔ Files ignored due to path filters (2)
backend/uv.lockis excluded by!**/*.lockfrontend/src/assets/dashboard.svgis excluded by!**/*.svg
📒 Files selected for processing (36)
backend/backend/celery_config.pybackend/backend/settings/base.pybackend/dashboard_metrics/README.mdbackend/dashboard_metrics/cache.pybackend/dashboard_metrics/capture.pybackend/dashboard_metrics/management/__init__.pybackend/dashboard_metrics/management/commands/__init__.pybackend/dashboard_metrics/management/commands/backfill_metrics.pybackend/dashboard_metrics/management/commands/generate_metrics_test_data.pybackend/dashboard_metrics/management/commands/populate_test_metrics.pybackend/dashboard_metrics/migrations/0001_initial.pybackend/dashboard_metrics/migrations/0002_setup_cleanup_tasks.pybackend/dashboard_metrics/migrations/0003_setup_aggregation_task.pybackend/dashboard_metrics/models.pybackend/dashboard_metrics/serializers.pybackend/dashboard_metrics/services.pybackend/dashboard_metrics/tasks.pybackend/dashboard_metrics/tests/test_tasks.pybackend/dashboard_metrics/urls.pybackend/dashboard_metrics/views.pybackend/pyproject.tomlfrontend/package.jsonfrontend/src/components/metrics-dashboard/MetricsChart.jsxfrontend/src/components/metrics-dashboard/MetricsDashboard.cssfrontend/src/components/metrics-dashboard/MetricsDashboard.jsxfrontend/src/components/metrics-dashboard/MetricsSummary.jsxfrontend/src/components/metrics-dashboard/MetricsTable.jsxfrontend/src/components/metrics-dashboard/index.jsfrontend/src/components/navigations/side-nav-bar/SideNavBar.jsxfrontend/src/helpers/metricsCache.jsfrontend/src/hooks/useMetricsData.jsfrontend/src/pages/MetricsDashboardPage.jsxfrontend/src/routes/useMainAppRoutes.jsunstract/core/src/unstract/core/metrics/__init__.pyunstract/core/src/unstract/core/metrics/backends/noop.pyunstract/core/src/unstract/core/metrics/backends/queue.py
✅ Files skipped from review due to trivial changes (1)
- frontend/src/components/metrics-dashboard/MetricsDashboard.css
🚧 Files skipped from review as they are similar to previous changes (4)
- backend/pyproject.toml
- backend/dashboard_metrics/tests/test_tasks.py
- backend/dashboard_metrics/urls.py
- backend/backend/settings/base.py
🧰 Additional context used
🧬 Code graph analysis (8)
frontend/src/components/metrics-dashboard/MetricsChart.jsx (1)
frontend/src/components/metrics-dashboard/MetricsDashboard.jsx (1)
Typography(34-34)
frontend/src/components/metrics-dashboard/MetricsDashboard.jsx (1)
frontend/src/components/metrics-dashboard/MetricsChart.jsx (2)
useMemo(130-164)useMemo(267-291)
backend/dashboard_metrics/services.py (5)
backend/api_v2/models.py (1)
APIDeployment(43-154)backend/pipeline_v2/models.py (1)
Pipeline(35-156)backend/usage_v2/models.py (1)
Usage(26-81)backend/workflow_manager/file_execution/models.py (1)
WorkflowFileExecution(68-250)backend/workflow_manager/workflow_v2/models/execution.py (2)
WorkflowExecution(102-372)organization_id(266-274)
backend/dashboard_metrics/serializers.py (2)
backend/dashboard_metrics/models.py (4)
EventMetricsHourly(39-142)Meta(109-142)Meta(215-248)Meta(321-354)backend/dashboard_metrics/views.py (2)
series(355-566)summary(243-352)
backend/dashboard_metrics/management/commands/backfill_metrics.py (3)
backend/account_v2/models.py (1)
Organization(12-46)backend/dashboard_metrics/models.py (4)
EventMetricsDaily(145-248)EventMetricsHourly(39-142)EventMetricsMonthly(251-354)MetricType(14-18)backend/dashboard_metrics/services.py (10)
MetricsQueryService(37-464)get_documents_processed(67-99)get_pages_processed(102-133)get_llm_calls(136-169)get_challenges(172-206)get_summarization_calls(209-243)get_deployed_api_requests(246-285)get_etl_pipeline_executions(288-328)get_llm_usage_cost(331-364)get_prompt_executions(367-398)
unstract/core/src/unstract/core/metrics/backends/noop.py (3)
unstract/core/src/unstract/core/metrics/backends/base.py (3)
AbstractMetricBackend(8-40)record(15-24)flush(27-32)unstract/core/src/unstract/core/metrics/types.py (1)
MetricEvent(52-96)unstract/core/src/unstract/core/metrics/__init__.py (1)
record(58-119)
backend/dashboard_metrics/migrations/0001_initial.py (2)
backend/dashboard_metrics/migrations/0003_setup_aggregation_task.py (1)
Migration(40-51)backend/dashboard_metrics/migrations/0002_setup_cleanup_tasks.py (1)
Migration(70-81)
unstract/core/src/unstract/core/metrics/__init__.py (5)
unstract/core/src/unstract/core/metrics/config.py (2)
get_backend(28-60)reset_backend(63-68)unstract/core/src/unstract/core/metrics/registry.py (3)
get_all_metrics(42-48)get_metric_type(25-39)validate_metric(13-22)unstract/core/src/unstract/core/metrics/types.py (3)
MetricEvent(52-96)MetricName(15-32)MetricType(8-12)unstract/core/src/unstract/core/metrics/backends/noop.py (1)
record(25-40)unstract/core/src/unstract/core/metrics/backends/queue.py (1)
record(117-156)
🪛 markdownlint-cli2 (0.18.1)
backend/dashboard_metrics/README.md
15-15: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
26-26: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
47-47: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
54-54: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
86-86: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
187-187: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
299-299: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
482-482: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
512-512: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
712-712: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.14.11)
backend/dashboard_metrics/migrations/0003_setup_aggregation_task.py
6-6: Unused function argument: schema_editor
(ARG001)
33-33: Unused function argument: schema_editor
(ARG001)
41-44: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
46-51: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
backend/backend/celery_config.py
42-49: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
56-66: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
69-74: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
unstract/core/src/unstract/core/metrics/backends/queue.py
95-96: try-except-pass detected, consider logging the exception
(S110)
95-95: Do not catch blind exception: Exception
(BLE001)
101-101: Consider moving this statement to an else block
(TRY300)
103-103: Do not catch blind exception: Exception
(BLE001)
113-113: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
115-115: Avoid specifying long messages outside the exception class
(TRY003)
149-149: Consider moving this statement to an else block
(TRY300)
212-212: Do not catch blind exception: Exception
(BLE001)
backend/dashboard_metrics/models.py
113-130: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
131-142: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
219-236: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
237-248: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
325-342: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
343-354: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
backend/dashboard_metrics/management/commands/populate_test_metrics.py
23-33: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
53-53: Unused method argument: args
(ARG002)
63-63: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
63-63: Avoid specifying long messages outside the exception class
(TRY003)
67-67: Avoid specifying long messages outside the exception class
(TRY003)
91-91: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
117-117: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
167-167: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
172-172: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
174-174: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
176-176: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
177-177: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
184-186: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
187-187: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
190-190: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
193-193: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
backend/dashboard_metrics/tasks.py
186-186: Do not catch blind exception: Exception
(BLE001)
398-398: Unpacked variable obj is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
439-439: Unpacked variable obj is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
480-480: Unpacked variable obj is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
669-669: Do not catch blind exception: Exception
(BLE001)
692-692: Redundant exception object included in logging.exception call
(TRY401)
backend/dashboard_metrics/management/commands/generate_metrics_test_data.py
29-32: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
34-41: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
43-45: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
48-54: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
56-62: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
89-89: Unused method argument: args
(ARG002)
232-232: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
244-244: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
246-246: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
250-252: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
253-255: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
258-258: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
259-259: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
285-285: Unused method argument: start_time
(ARG002)
285-285: Unused method argument: end_time
(ARG002)
292-292: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
299-299: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
302-302: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
307-307: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
308-308: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
309-309: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
313-313: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
351-351: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
361-361: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
364-364: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
365-365: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
369-369: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
379-379: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
408-408: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
409-409: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
411-411: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
412-412: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
421-421: Loop control variable i not used within loop body
Rename unused i to _i
(B007)
backend/dashboard_metrics/migrations/0002_setup_cleanup_tasks.py
6-6: Unused function argument: schema_editor
(ARG001)
58-58: Unused function argument: schema_editor
(ARG001)
71-74: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
76-81: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
backend/dashboard_metrics/serializers.py
121-132: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
backend/dashboard_metrics/capture.py
45-45: Consider moving this statement to an else block
(TRY300)
backend/dashboard_metrics/management/commands/backfill_metrics.py
35-49: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
85-85: Unused method argument: args
(ARG002)
258-258: Do not catch blind exception: Exception
(BLE001)
278-278: Unpacked variable obj is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
295-295: Do not catch blind exception: Exception
(BLE001)
309-309: Unpacked variable obj is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
326-326: Do not catch blind exception: Exception
(BLE001)
340-340: Unpacked variable obj is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
357-357: Do not catch blind exception: Exception
(BLE001)
unstract/core/src/unstract/core/metrics/backends/noop.py
25-25: Unused method argument: event
(ARG002)
backend/dashboard_metrics/migrations/0001_initial.py
13-15: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
17-381: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
backend/dashboard_metrics/cache.py
103-103: Probable use of insecure hash functions in hashlib: md5
(S324)
176-176: Consider moving this statement to an else block
(TRY300)
277-277: Do not catch blind exception: Exception
(BLE001)
470-470: Consider moving this statement to an else block
(TRY300)
510-510: Do not catch blind exception: Exception
(BLE001)
backend/dashboard_metrics/views.py
59-59: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
60-60: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
62-62: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
63-63: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
64-64: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
72-72: Abstract raise to an inner function
(TRY301)
72-72: Avoid specifying long messages outside the exception class
(TRY003)
78-78: Avoid specifying long messages outside the exception class
(TRY003)
84-84: Avoid specifying long messages outside the exception class
(TRY003)
235-235: Loop control variable bucket_ts not used within loop body
(B007)
237-237: Loop control variable bucket_ts not used within loop body
(B007)
679-679: Avoid specifying long messages outside the exception class
(TRY003)
737-737: Avoid specifying long messages outside the exception class
(TRY003)
830-830: Unused method argument: request
(ARG002)
863-863: Consider moving this statement to an else block
(TRY300)
875-875: Consider moving this statement to an else block
(TRY300)
unstract/core/src/unstract/core/metrics/__init__.py
39-53: __all__ is not sorted
Apply an isort-style sorting to __all__
(RUF022)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
| def _upsert_hourly(self, aggregations: dict) -> tuple[int, int]: | ||
| """Upsert hourly aggregations.""" | ||
| created, updated = 0, 0 | ||
| with transaction.atomic(): | ||
| for key, agg in aggregations.items(): | ||
| org_id, ts_str, metric_name, project, tag = key | ||
| timestamp = datetime.fromisoformat(ts_str) | ||
|
|
||
| try: | ||
| obj, was_created = EventMetricsHourly._base_manager.update_or_create( | ||
| organization_id=org_id, | ||
| timestamp=timestamp, | ||
| metric_name=metric_name, | ||
| project=project, | ||
| tag=tag, | ||
| defaults={ | ||
| "metric_type": agg["metric_type"], | ||
| "metric_value": agg["value"], | ||
| "metric_count": agg["count"], | ||
| "labels": agg["labels"], | ||
| }, | ||
| ) | ||
| if was_created: | ||
| created += 1 | ||
| else: | ||
| updated += 1 | ||
| except Exception as e: | ||
| logger.warning(f"Error upserting hourly {key}: {e}") | ||
|
|
||
| return created, updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Catching exceptions inside a single outer transaction.atomic() can break the whole backfill.
If any DB error occurs (e.g., IntegrityError), the transaction may become unusable and subsequent iterations can fail even though exceptions are caught.
Simplest fix: remove the outer transaction.atomic() (each update_or_create() is already atomic), or use a per-row savepoint.
Proposed fix (remove outer atomic + keep loop going safely)
def _upsert_hourly(self, aggregations: dict) -> tuple[int, int]:
"""Upsert hourly aggregations."""
created, updated = 0, 0
- with transaction.atomic():
- for key, agg in aggregations.items():
- org_id, ts_str, metric_name, project, tag = key
- timestamp = datetime.fromisoformat(ts_str)
+ for key, agg in aggregations.items():
+ org_id, ts_str, metric_name, project, tag = key
+ timestamp = datetime.fromisoformat(ts_str)
- try:
- obj, was_created = EventMetricsHourly._base_manager.update_or_create(
+ try:
+ _, was_created = EventMetricsHourly._base_manager.update_or_create(
organization_id=org_id,
timestamp=timestamp,
metric_name=metric_name,
project=project,
tag=tag,
defaults={
"metric_type": agg["metric_type"],
"metric_value": agg["value"],
"metric_count": agg["count"],
"labels": agg["labels"],
},
)
- if was_created:
- created += 1
- else:
- updated += 1
- except Exception as e:
- logger.warning(f"Error upserting hourly {key}: {e}")
+ if was_created:
+ created += 1
+ else:
+ updated += 1
+ except Exception as e:
+ logger.warning(f"Error upserting hourly {key}: {e}")
return created, updatedApply the same pattern to daily/monthly; also prefix unused obj as _ (RUF059).
Also applies to: 300-329, 331-360
🧰 Tools
🪛 Ruff (0.14.11)
278-278: Unpacked variable obj is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
295-295: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
In @backend/dashboard_metrics/management/commands/backfill_metrics.py around
lines 269 - 298, The outer transaction.atomic() in _upsert_hourly wraps all
update_or_create calls so a DB error (e.g., IntegrityError) can leave the
transaction unusable and abort the rest of the loop; remove the outer
transaction.atomic() (or alternatively use a per-row savepoint) so each
EventMetricsHourly._base_manager.update_or_create call runs in its own atomic
context and the loop can continue on exceptions; also change the unused returned
variable name obj to _ and apply the same change (remove outer atomic or use
per-row savepoints and rename obj → _) in the corresponding daily/monthly upsert
methods that use update_or_create.
| def _clean_test_data(self, org_id_str: str, workflow: Workflow): | ||
| """Remove existing test data.""" | ||
| self.stdout.write("Cleaning existing test data...") | ||
|
|
||
| # Delete file executions first (FK constraint) | ||
| file_exec_count = WorkflowFileExecution.objects.filter( | ||
| workflow_execution__workflow=workflow | ||
| ).count() | ||
| WorkflowFileExecution.objects.filter( | ||
| workflow_execution__workflow=workflow | ||
| ).delete() | ||
|
|
||
| # Delete workflow executions | ||
| exec_count = WorkflowExecution.objects.filter(workflow=workflow).count() | ||
| WorkflowExecution.objects.filter(workflow=workflow).delete() | ||
|
|
||
| # Delete usage records | ||
| usage_count = Usage.objects.filter(organization_id=org_id_str).count() | ||
| Usage.objects.filter(organization_id=org_id_str).delete() | ||
|
|
||
| # Delete page usage records | ||
| page_count = PageUsage.objects.filter(organization_id=org_id_str).count() | ||
| PageUsage.objects.filter(organization_id=org_id_str).delete() | ||
|
|
||
| self.stdout.write( | ||
| f" Deleted: {exec_count} executions, {file_exec_count} file executions, " | ||
| f"{usage_count} usage, {page_count} page usage" | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--clean is dangerously destructive (deletes real org data).
_clean_test_data() deletes all Usage + PageUsage for the org, not just data created by this command. This is too risky for staging/prod. Require an explicit --force/--i-understand-this-deletes-real-data flag and/or scope deletions to records created by this command (e.g., dedicated workflow name/tag/marker label).
| def _generate_workflow_executions( | ||
| self, workflow: Workflow, start_time, end_time, count: int | ||
| ) -> list[WorkflowExecution]: | ||
| """Generate WorkflowExecution records.""" | ||
| executions = [] | ||
| for _ in range(count): | ||
| created_at = self._random_timestamp(start_time, end_time) | ||
|
|
||
| # 90% completed, 10% error (matching staging ratio) | ||
| is_error = random.random() >= 0.9 | ||
| status = ExecutionStatus.ERROR if is_error else ExecutionStatus.COMPLETED | ||
| error_message = random.choice(self.EXECUTION_ERRORS) if is_error else "" | ||
|
|
||
| exec_obj = WorkflowExecution( | ||
| workflow=workflow, | ||
| execution_mode=random.choice( | ||
| [WorkflowExecution.Mode.INSTANT, WorkflowExecution.Mode.QUEUE] | ||
| ), | ||
| execution_method=random.choice( | ||
| [WorkflowExecution.Method.DIRECT, WorkflowExecution.Method.SCHEDULED] | ||
| ), | ||
| execution_type=WorkflowExecution.Type.COMPLETE, | ||
| status=status, | ||
| total_files=random.randint(1, 10), | ||
| execution_time=random.uniform(5.0, 120.0), | ||
| error_message=error_message, | ||
| ) | ||
| exec_obj._target_created_at = created_at # Store target timestamp | ||
| executions.append(exec_obj) | ||
|
|
||
| # Bulk create | ||
| WorkflowExecution.objects.bulk_create(executions) | ||
|
|
||
| # Refresh to get IDs (needed since bulk_create may not return them) | ||
| created_executions = list( | ||
| WorkflowExecution.objects.filter(workflow=workflow).order_by("-id")[:count] | ||
| ) | ||
|
|
||
| # Update created_at timestamps (auto_now_add overrides during bulk_create) | ||
| for i, exec_obj in enumerate(created_executions): | ||
| target_ts = executions[i]._target_created_at if i < len(executions) else None | ||
| if target_ts: | ||
| WorkflowExecution.objects.filter(id=exec_obj.id).update( | ||
| created_at=target_ts | ||
| ) | ||
| exec_obj.created_at = target_ts # Update in-memory object too | ||
|
|
||
| return created_executions | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
find . -type f -name "*.py" | grep -E "(models|workflow)" | head -20Repository: Zipstack/unstract
Length of output: 1019
🏁 Script executed:
find . -type f -name "*.py" -path "*/models*" | head -20Repository: Zipstack/unstract
Length of output: 927
🏁 Script executed:
cat -n backend/dashboard_metrics/management/commands/generate_metrics_test_data.py | sed -n '235,283p'Repository: Zipstack/unstract
Length of output: 2562
🏁 Script executed:
rg "class WorkflowExecution" -A 20 --type pyRepository: Zipstack/unstract
Length of output: 50373
🏁 Script executed:
cat -n backend/workflow_manager/workflow_v2/models/execution.py | head -100Repository: Zipstack/unstract
Length of output: 4461
🏁 Script executed:
rg "created_at" backend/workflow_manager/workflow_v2/models/execution.py -B 2 -A 2Repository: Zipstack/unstract
Length of output: 654
🏁 Script executed:
sed -n '1,150p' backend/workflow_manager/workflow_v2/models/execution.py | tail -80Repository: Zipstack/unstract
Length of output: 2988
🏁 Script executed:
sed -n '100,250p' backend/workflow_manager/workflow_v2/models/execution.pyRepository: Zipstack/unstract
Length of output: 5347
🏁 Script executed:
find . -type f -name "base_model.py" | head -5Repository: Zipstack/unstract
Length of output: 97
🏁 Script executed:
rg "class BaseModel" -A 30 --type py | head -80Repository: Zipstack/unstract
Length of output: 453
🏁 Script executed:
rg "bulk_create\(" backend/ --type py -A 5 | head -60Repository: Zipstack/unstract
Length of output: 4882
🏁 Script executed:
rg "bulk_update\(" backend/ --type py -A 3 | head -60Repository: Zipstack/unstract
Length of output: 797
🏁 Script executed:
cat backend/requirements.txt | grep -i djangoRepository: Zipstack/unstract
Length of output: 117
🏁 Script executed:
python3 << 'EOF'
# Check Django bulk_create behavior documentation
# In Django 1.10+, bulk_create returns a list of objects with PKs populated
# However, this depends on database backend support
import subprocess
result = subprocess.run(
["grep", "-r", "bulk_create", "backend/", "--include=*.py"],
capture_output=True,
text=True
)
# Look for any comments or patterns about bulk_create and PKs
lines = result.stdout.split('\n')
for i, line in enumerate(lines):
if 'bulk_create' in line and ('id' in line or 'pk' in line or 'return' in line):
print(f"{i}: {line}")
EOFRepository: Zipstack/unstract
Length of output: 147
🏁 Script executed:
sed -n '290,350p' backend/dashboard_metrics/management/commands/generate_metrics_test_data.pyRepository: Zipstack/unstract
Length of output: 2465
🏁 Script executed:
sed -n '150,210p' backend/dashboard_metrics/management/commands/generate_metrics_test_data.pyRepository: Zipstack/unstract
Length of output: 2359
🏁 Script executed:
sed -n '290,330p' backend/dashboard_metrics/management/commands/generate_metrics_test_data.pyRepository: Zipstack/unstract
Length of output: 1830
🏁 Script executed:
rg "class WorkflowFileExecution" -A 40 --type py | head -60Repository: Zipstack/unstract
Length of output: 5126
🏁 Script executed:
rg "class WorkflowFileExecution" backend/ -A 50 --type py | head -80Repository: Zipstack/unstract
Length of output: 6921
🏁 Script executed:
find . -type f -name "*.py" -path "*/models*" -exec grep -l "WorkflowFileExecution" {} \;Repository: Zipstack/unstract
Length of output: 175
🏁 Script executed:
cat -n backend/workflow_manager/file_execution/models.py | head -100Repository: Zipstack/unstract
Length of output: 4286
🏁 Script executed:
rg "created_at" backend/workflow_manager/file_execution/models.py -B 2 -A 2Repository: Zipstack/unstract
Length of output: 436
🏁 Script executed:
python3 << 'EOF'
# Write a test to understand the behavior
test_code = """
import os
import django
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'unstract.settings')
django.setup()
from workflow_manager.workflow_v2.models import WorkflowExecution, Workflow
# Check Django version
print(f"Django version: {django.__version__}")
# Test if bulk_create returns objects with PKs populated
# We'll look at the method signature
import inspect
bulk_create_sig = inspect.signature(WorkflowExecution.objects.bulk_create)
print(f"bulk_create signature: {bulk_create_sig}")
# Check if bulk_update exists and its signature
if hasattr(WorkflowExecution.objects, 'bulk_update'):
bulk_update_sig = inspect.signature(WorkflowExecution.objects.bulk_update)
print(f"bulk_update signature: {bulk_update_sig}")
"""
# Write to file
with open('/tmp/test_django_bulk.py', 'w') as f:
f.write(test_code)
print("Test file created at /tmp/test_django_bulk.py")
EOFRepository: Zipstack/unstract
Length of output: 106
🏁 Script executed:
find . -type f -name "setup.py" -o -name "pyproject.toml" -o -name "setup.cfg" | head -5Repository: Zipstack/unstract
Length of output: 192
🏁 Script executed:
find . -name "setup.py" -o -name "pyproject.toml" | xargs cat 2>/dev/null | grep -i "django" | head -10Repository: Zipstack/unstract
Length of output: 470
🏁 Script executed:
find . -path "*/requirements*" -type f | xargs cat 2>/dev/null | grep -i "django" | head -5Repository: Zipstack/unstract
Length of output: 43
🏁 Script executed:
python3 << 'EOF'
# Django 4.2.1 behavior verification
# Check if bulk_update can update auto_now_add fields
import_note = """
Django 4.2.1 facts:
1. bulk_create() DOES return objects with populated PKs (since Django 1.10+)
2. bulk_update() exists (since Django 3.2+)
3. auto_now_add fields: Django skips them in normal .save() but NOT in bulk_update()
- However, auto_now_add fields are typically READ-ONLY at the ORM level
- bulk_update() will raise an error if you try to update auto_now_add fields
"""
print(import_note)
# The actual problem with the code:
problem = """
ISSUE WITH CURRENT CODE:
The line: WorkflowExecution.objects.filter(workflow=workflow).order_by("-id")[:count]
Is problematic because:
1. If multiple workflows exist, this might fetch executions from OTHER workflows
2. The .order_by("-id")[:count] doesn't guarantee getting the SAME rows created in bulk_create
3. Example: If you have 10 total rows, and bulk_create 5 new ones:
- order_by("-id")[:5] might get the 5 newest, but order might not match creation order
- Mapping executions[i]._target_created_at to created_executions[i] becomes wrong
ISSUE WITH PROPOSED SOLUTION:
The proposed fix assumes bulk_update() can override auto_now_add fields, which it CANNOT.
auto_now_add fields are protected and cannot be updated via Django ORM bulk_update().
"""
print(problem)
EOFRepository: Zipstack/unstract
Length of output: 1205
🏁 Script executed:
python3 << 'EOF'
# Check if bulk_update tries to update auto_now_add fields
# In the proposed solution, it tries to update created_at which has auto_now_add=True
# Looking at Django 4.2.1 documentation and source:
# Fields with auto_now_add=True cannot be updated via bulk_update()
# Django will either:
# 1. Skip the field silently (pre-3.2)
# 2. Raise ValidationError (4.2+)
print("""
CRITICAL FINDING:
In Django 4.2.1, the Model.Meta.readonly_fields and auto_now_add fields
are generally protected from bulk updates.
However, the current code uses a different approach:
- It uses individual .update() queries:
WorkflowExecution.objects.filter(id=exec_obj.id).update(created_at=target_ts)
This BYPASSES the ORM protection because .update() at the QuerySet level
directly executes SQL UPDATE statements, which DO allow setting auto_now_add fields.
The PROPOSED solution tries to use bulk_update(), which would be BLOCKED
by Django's ORM protection on auto_now_add fields.
""")
EOFRepository: Zipstack/unstract
Length of output: 657
🏁 Script executed:
cat > /tmp/check_solution.py << 'EOF'
# Simulate what the proposed solution does
proposed_code = """
# Proposed fix from review:
for exec_obj in executions:
exec_obj.created_at = exec_obj._target_created_at
WorkflowExecution.objects.bulk_update(executions, ["created_at"])
"""
# The issue:
# bulk_update() in Django 4.2.1 will NOT allow updating fields with auto_now_add=True
# It will raise a ValidationError or skip the field
print("PROPOSED SOLUTION PROBLEM:")
print("=" * 60)
print(proposed_code)
print("\nWill FAIL because:")
print("1. created_at has auto_now_add=True in BaseModel")
print("2. bulk_update() respects ORM constraints and won't update auto_now_add fields")
print("3. Django 4.2.1 strictly enforces readonly semantics")
print("\nThe CURRENT code actually works because:")
print("1. It uses QuerySet.update() which bypasses ORM constraints")
print("2. Direct SQL UPDATE statements can modify auto_now_add columns")
print("\nBUT the current code's REAL PROBLEM is:")
print("1. order_by('-id')[:count] is unreliable")
print("2. ID ordering doesn't guarantee getting the newly created rows")
print("3. Order of re-fetched rows may not match order in original list")
EOF
python3 /tmp/check_solution.pyRepository: Zipstack/unstract
Length of output: 904
Bulk-create timestamp backfill is unreliable, but proposed solution won't work.
Re-querying WorkflowExecution.objects.filter(...).order_by("-id")[:count] is not guaranteed to fetch the newly created rows or preserve order. This can cause mismatches between _target_created_at values and the fetched executions.
However, the proposed bulk_update() solution will not work because created_at has auto_now_add=True, and Django 4.2.1 prevents bulk_update() from modifying auto_now_add fields. The current approach using individual QuerySet.update() calls bypasses this ORM protection, but the underlying re-query logic is the actual problem.
Use the PKs returned directly from bulk_create() and update them individually:
Corrected approach
# Bulk create returns objects with PKs already populated
created_executions = WorkflowExecution.objects.bulk_create(executions)
# Update created_at using the PKs we already have (no re-query needed)
for exec_obj in created_executions:
WorkflowExecution.objects.filter(id=exec_obj.id).update(
created_at=exec_obj._target_created_at
)🧰 Tools
🪛 Ruff (0.14.11)
244-244: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
246-246: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
250-252: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
253-255: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
258-258: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
259-259: Standard pseudo-random generators are not suitable for cryptographic purposes
(S311)
🤖 Prompt for AI Agents
In @backend/dashboard_metrics/management/commands/generate_metrics_test_data.py
around lines 235 - 283, The re-query after bulk_create in
_generate_workflow_executions is unreliable and may misalign _target_created_at
values; instead use the list returned by
WorkflowExecution.objects.bulk_create(executions) (these objects have PKs
populated) and iterate that returned list to update each record's created_at via
individual QuerySet.update(id=exec_obj.id) using exec_obj._target_created_at (do
not attempt bulk_update because created_at uses auto_now_add).
| with transaction.atomic(): | ||
| for key, agg in aggregations.items(): | ||
| org_id, hour_ts_str, metric_name, project, tag = key | ||
| hour_ts = datetime.fromisoformat(hour_ts_str) | ||
|
|
||
| try: | ||
| obj, was_created = ( | ||
| EventMetricsHourly.objects.select_for_update().get_or_create( | ||
| organization_id=org_id, | ||
| timestamp=hour_ts, | ||
| metric_name=metric_name, | ||
| project=project, | ||
| tag=tag, | ||
| defaults={ | ||
| "metric_type": agg["metric_type"], | ||
| "metric_value": agg["value"], | ||
| "metric_count": agg["count"], | ||
| "labels": agg["labels"], | ||
| }, | ||
| ) | ||
| ) | ||
| if was_created: | ||
| created += 1 | ||
| else: | ||
| obj.metric_value = F("metric_value") + agg["value"] | ||
| obj.metric_count = F("metric_count") + agg["count"] | ||
| if agg["labels"]: | ||
| obj.labels = {**obj.labels, **agg["labels"]} | ||
| obj.save( | ||
| update_fields=[ | ||
| "metric_value", | ||
| "metric_count", | ||
| "labels", | ||
| "modified_at", | ||
| ] | ||
| ) | ||
| updated += 1 | ||
| except Exception: | ||
| logger.exception("Error upserting hourly metric %s", key) | ||
|
|
||
| return created, updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's check the file structure and find the tasks.py file
fd -type f -name "tasks.py" | grep dashboard_metricsRepository: Zipstack/unstract
Length of output: 231
🏁 Script executed:
# Get the full file to see all instances mentioned (222-262, 276-317, 330-371)
wc -l backend/dashboard_metrics/tasks.pyRepository: Zipstack/unstract
Length of output: 99
🏁 Script executed:
# Read the relevant sections to understand the pattern
cat -n backend/dashboard_metrics/tasks.py | sed -n '220,265p'Repository: Zipstack/unstract
Length of output: 2159
🏁 Script executed:
# Also check the other instances
cat -n backend/dashboard_metrics/tasks.py | sed -n '274,320p'
cat -n backend/dashboard_metrics/tasks.py | sed -n '328,375p'Repository: Zipstack/unstract
Length of output: 4390
🏁 Script executed:
# Find the EventMetricsHourly model to understand the unique constraints
fd -type f -name "*.py" | xargs grep -l "class EventMetricsHourly" 2>/dev/nullRepository: Zipstack/unstract
Length of output: 233
🏁 Script executed:
# Search for EventMetricsHourly model definition
find backend -name "*.py" -exec grep -l "class EventMetricsHourly" {} \;Repository: Zipstack/unstract
Length of output: 137
🏁 Script executed:
# Also search for models.py in dashboard_metrics
find backend/dashboard_metrics -name "*.py"Repository: Zipstack/unstract
Length of output: 1068
🏁 Script executed:
# Check if there's a models file in the app
ls -la backend/dashboard_metrics/Repository: Zipstack/unstract
Length of output: 1032
🏁 Script executed:
# Read the models.py file to find the model definitions and unique constraints
cat -n backend/dashboard_metrics/models.pyRepository: Zipstack/unstract
Length of output: 13484
🏁 Script executed:
# Also check if there are any database-level constraints in migrations
head -100 backend/dashboard_metrics/migrations/0001_initial.pyRepository: Zipstack/unstract
Length of output: 3483
🏁 Script executed:
# Check if there's any test code that demonstrates the concurrency issue
cat -n backend/dashboard_metrics/tests/test_tasks.py | head -100Repository: Zipstack/unstract
Length of output: 3906
🏁 Script executed:
# Look for any imports or usage of IntegrityError in the tasks file
grep -n "IntegrityError\|import\|from django" backend/dashboard_metrics/tasks.py | head -20Repository: Zipstack/unstract
Length of output: 512
🏁 Script executed:
# Check the transaction imports and patterns
sed -n '1,50p' backend/dashboard_metrics/tasks.pyRepository: Zipstack/unstract
Length of output: 1543
🏁 Script executed:
# Check what specific exceptions are being imported and used
grep -n "IntegrityError\|DatabaseError\|OperationalError" backend/dashboard_metrics/tasks.pyRepository: Zipstack/unstract
Length of output: 350
🏁 Script executed:
# Look at the context around exception handling more broadly
sed -n '215,265p' backend/dashboard_metrics/tasks.pyRepository: Zipstack/unstract
Length of output: 2022
🌐 Web query:
Django select_for_update get_or_create IntegrityError race condition concurrent insert
💡 Result:
Short answer
- select_for_update won't stop concurrent inserts when the row doesn't exist yet — it only locks existing rows. That lets two transactions both decide to create and one raise IntegrityError on insert. See Django and MySQL/Postgres notes. [1][2][3]
Recommended patterns to avoid the race
-
Rely on a DB unique constraint + retry on IntegrityError (simple, portable).
- Try get(); if not found, attempt create() inside transaction; on IntegrityError, re-get. This is effectively what Django's get_or_create does. [3][4]
-
Serialize using a lock at a higher granularity
- Lock a parent/placeholder row with select_for_update() (must be inside transaction.atomic()) so concurrent txns block until the creator commits. Works when you can lock some existing row or a dedicated "singleton" row. [2][5]
-
Use advisory locks (Postgres) for arbitrary keys
- Acquire pg_advisory_lock(...) for the key you want to create, then check/create. This prevents races without requiring a physical locking row. (Requires DB-specific code.) [6]
-
Use stricter isolation only with care
- Changing isolation levels (e.g., to SERIALIZABLE) can prevent some anomalies but has trade-offs (retries, portability). The Django docs and tickets warn about relying on isolation-level changes. [1][2]
Minimal example (unique-constraint + retry)
- Ensure a UNIQUE constraint on the identifying fields.
- Pseudocode:
with transaction.atomic():
try:
obj = Model.objects.get(**lookup)
except Model.DoesNotExist:
try:
obj = Model.objects.create(**params)
except IntegrityError:
obj = Model.objects.get(**lookup)
References
[1] Django ticket discussing get_or_create and isolation-level issues.
[2] Django ticket and discussion: SELECT FOR UPDATE does nothing when row absent (race described).
[3] Django docs / tickets showing get_or_create relies on DB uniqueness and may raise IntegrityError; stackoverflow posts explaining behavior.
[4] Django ticket/implementation of get_or_create/_create_object_from_params (shows fallback on IntegrityError).
[5] StackOverflow: select_for_update must be inside transaction.atomic and locks an existing row.
[6] Postgres advisory locks pattern (use for key-based serialization).
If you want, tell me which database (Postgres/MySQL) and I’ll give a concise code example for the best pattern for that DB.
🏁 Script executed:
# Check what database backend is used
find backend -name "settings*.py" -o -name "*.py" | xargs grep -l "DATABASE\|DATABASES" | head -5Repository: Zipstack/unstract
Length of output: 457
🏁 Script executed:
# Look for Django settings to understand the database backend
grep -r "ENGINE" backend/*/settings* 2>/dev/null || grep -r "postgresql\|mysql\|sqlite" backend/*/settings* 2>/dev/null | head -10Repository: Zipstack/unstract
Length of output: 282
🏁 Script executed:
# Check if there are other examples of IntegrityError handling in the codebase
grep -r "IntegrityError" backend/ --include="*.py" | head -20Repository: Zipstack/unstract
Length of output: 1660
🏁 Script executed:
# Check the custom_db backend to understand the actual database engine
cat backend/backend/custom_db.py 2>/dev/null || grep -A 10 "custom_db\|DATABASE" backend/backend/settings/base.py | head -20Repository: Zipstack/unstract
Length of output: 557
🏁 Script executed:
# Check if postgres is used by looking at requirements or imports
grep -r "psycopg\|postgresql" backend/ --include="*.txt" --include="*.py" | head -5Repository: Zipstack/unstract
Length of output: 555
🏁 Script executed:
# Look at how IntegrityError is being handled in similar patterns (e.g., account_v2)
sed -n '1,50p' backend/account_v2/organization.py | grep -A 10 "IntegrityError"Repository: Zipstack/unstract
Length of output: 445
Catch IntegrityError explicitly to prevent silent data loss on concurrent upserts.
select_for_update() only locks existing rows. When two concurrent tasks insert the same (org, timestamp, metric, project, tag) key, the non-existent row cannot be locked, both attempt insert, one raises IntegrityError, and the generic Exception handler swallows it—losing the increment without returning an error signal.
Import IntegrityError and catch it explicitly with a retry (matching the pattern used elsewhere in the codebase in account_v2, connector_v2, pipeline_v2):
from django.db import IntegrityError
try:
obj, was_created = EventMetricsHourly.objects.select_for_update().get_or_create(...)
except IntegrityError:
# Row was created concurrently; fetch and update
obj = EventMetricsHourly.objects.get(organization_id=org_id, timestamp=hour_ts, ...)
was_created = False
except Exception:
logger.exception("Error upserting hourly metric %s", key)
continueAlternatively, for PostgreSQL, use INSERT ... ON CONFLICT DO UPDATE to atomically upsert without retry logic.
Also applies to: _bulk_upsert_daily (276–317), _bulk_upsert_monthly (330–371).
🤖 Prompt for AI Agents
In @backend/dashboard_metrics/tasks.py around lines 222 - 262, The current broad
except in the upsert loop swallows IntegrityError from concurrent inserts
causing lost increments; import django.db.IntegrityError and explicitly catch
IntegrityError around the get_or_create in EventMetricsHourly upsert logic (the
block that calls select_for_update().get_or_create and currently sets obj,
was_created), then on IntegrityError re-query the row
(EventMetricsHourly.objects.get(...) for the same organization_id, timestamp,
metric_name, project, tag), set was_created=False and proceed to update counters
and labels as in the existing else-path; keep the generic Exception handler for
other errors but ensure you continue after logging; apply the same explicit
IntegrityError catch-and-retry pattern to the _bulk_upsert_daily and
_bulk_upsert_monthly blocks as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@backend/dashboard_metrics/views.py`:
- Around line 845-848: The current parsing of limit uses limit =
min(int(request.query_params.get("limit", 10)), 50) which will raise a
ValueError for non-integer input and cause a 500; update the parsing in the view
(where organization = self._get_organization() is used and limit is computed) to
defensively parse the query param by catching ValueError/TypeError, falling back
to a safe default (e.g., 10), enforcing a minimum (e.g., 1) and maximum of 50,
and then assign the sanitized value to limit; reference the
request.query_params.get("limit", 10) expression and the limit variable when
making the change.
🧹 Nitpick comments (6)
backend/dashboard_metrics/views.py (1)
331-341: Avoid per-metric lookup formetric_typein summary.The loop issues one query per metric to fetch
metric_type. You can pull it in the initial aggregation to remove the N+1.♻️ Suggested refactor
- summary = ( - queryset.values("metric_name") - .annotate( - total_value=Sum("metric_value"), - total_count=Sum("metric_count"), - average_value=Avg("metric_value"), - min_value=Min("metric_value"), - max_value=Max("metric_value"), - ) - .order_by("metric_name") - ) - - # Add metric_type from the first record of each metric - summary_list = [] - for row in summary: - metric_type_record = ( - queryset.filter(metric_name=row["metric_name"]) - .values("metric_type") - .first() - ) - row["metric_type"] = ( - metric_type_record["metric_type"] if metric_type_record else "counter" - ) - summary_list.append(row) + summary_list = list( + queryset.values("metric_name", "metric_type") + .annotate( + total_value=Sum("metric_value"), + total_count=Sum("metric_count"), + average_value=Avg("metric_value"), + min_value=Min("metric_value"), + max_value=Max("metric_value"), + ) + .order_by("metric_name") + )frontend/src/components/metrics-dashboard/MetricsChart.jsx (5)
122-128: Clean up forEach callback to avoid returning a value.The static analysis correctly identifies that
Set.add()returns a value, andforEachcallbacks should not return values. While this works, it's cleaner to use a block body.♻️ Suggested fix
data.daily_trend.forEach((item) => { if (item.metrics) { - Object.keys(item.metrics).forEach((key) => metricNamesSet.add(key)); + Object.keys(item.metrics).forEach((key) => { + metricNamesSet.add(key); + }); } });
179-197: Move static color constants to module level.
METRIC_BAR_COLORS,DEFAULT_BAR_COLORS, andgetBarColorare recreated on every render. Since these are static, move them outside the component for minor performance improvement and consistency withTREND_COLORSat line 21.♻️ Suggested refactor
Move these definitions after line 31 (after
TREND_COLORS):// Bar colors - metric-specific for key metrics, fallback array for others const METRIC_BAR_COLORS = { pages_processed: "#1890ff", documents_processed: "#52c41a", failed_pages: "#ff4d4f", llm_calls: "#722ed1", prompt_executions: "#faad14", llm_usage: "#13c2c2", }; const DEFAULT_BAR_COLORS = [ "#1890ff", "#faad14", "#52c41a", "#722ed1", "#13c2c2", ]; const getBarColor = (metric, idx) => METRIC_BAR_COLORS[metric] || DEFAULT_BAR_COLORS[idx % DEFAULT_BAR_COLORS.length];
410-416: Same forEach callback issue as noted earlier.Apply the same fix here for consistency.
♻️ Suggested fix
data.daily_trend.forEach((item) => { if (item.metrics) { - Object.keys(item.metrics).forEach((key) => metricNamesSet.add(key)); + Object.keys(item.metrics).forEach((key) => { + metricNamesSet.add(key); + }); } });
779-792: MoveTREND_BAR_COLORSandgetTrendColorto module level.Same issue as in
MetricsChart— these static constants are recreated on every render.
396-551: Consider extracting shared chart logic into reusable utilities.
MetricsBreakdownandTrendAnalysisChartshare nearly identical patterns:
- Filter dropdown state and handlers (
selectedMetrics,handleFilterClick,cardExtra)- Metric extraction from
daily_trend- Loading/empty state rendering
A custom hook (e.g.,
useMetricFilter) and a wrapper component for the card chrome could reduce ~100 lines of duplication across these components. This is optional and can be deferred.Also applies to: 674-848
| organization = self._get_organization() | ||
| org_id = str(organization.id) | ||
| limit = min(int(request.query_params.get("limit", 10)), 50) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard recent_activity limit parsing to avoid 500 on invalid input.
int() on a bad limit query param will raise and return a 500. A small guard keeps the endpoint stable.
🛠️ Suggested fix
- limit = min(int(request.query_params.get("limit", 10)), 50)
+ try:
+ limit = int(request.query_params.get("limit", 10))
+ except (TypeError, ValueError):
+ return Response(
+ {"error": "limit must be an integer"},
+ status=status.HTTP_400_BAD_REQUEST,
+ )
+ limit = min(limit, 50)🤖 Prompt for AI Agents
In `@backend/dashboard_metrics/views.py` around lines 845 - 848, The current
parsing of limit uses limit = min(int(request.query_params.get("limit", 10)),
50) which will raise a ValueError for non-integer input and cause a 500; update
the parsing in the view (where organization = self._get_organization() is used
and limit is computed) to defensively parse the query param by catching
ValueError/TypeError, falling back to a safe default (e.g., 10), enforcing a
minimum (e.g., 1) and maximum of 50, and then assign the sanitized value to
limit; reference the request.query_params.get("limit", 10) expression and the
limit variable when making the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@backend/dashboard_metrics/tasks.py`:
- Around line 586-694: The exception handlers inside the org-processing block
are currently catching all Exceptions (the inner try around the metric loop and
the outer try wrapping org processing), which swallows
DatabaseError/OperationalError and prevents Celery's autoretry; change both
handlers to re-raise DB errors so Celery can retry: in the inner except around
the metric loop (where logger.warning(...) and stats["errors"] += 1 happen),
detect and re-raise DatabaseError and OperationalError before handling other
exceptions, and in the outer except (where logger.exception(...) and
stats["errors"] += 1 happen) do the same; keep the existing logging/stats
increments for non-DB exceptions and ensure _upsert_hourly_replace,
_upsert_daily_replace, and _upsert_monthly_replace DB errors will propagate.
In `@frontend/src/components/metrics-dashboard/MetricsChart.jsx`:
- Around line 258-286: The chart can render blank when metricsToShow (computed
from selectedMetrics and availableMetrics) is empty; add an explicit early
return that checks metricsToShow.length === 0 and renders the same Card used for
other empty states (e.g., title "Trend Analysis (Last 30 Days)" with an Empty
component and a helpful description like "No metrics selected or available")
before the chart rendering logic so users see a clear empty state instead of a
blank chart.
In `@frontend/src/hooks/useMetricsData.js`:
- Around line 114-118: When orgId becomes falsy the effect should clear the
recent-activity state to avoid showing stale data; inside the useEffect that
currently calls fetchActivity() when orgId is truthy (the effect depending on
[orgId, limit]), add an else branch that resets your activity state (e.g., call
the state updater used to store recent activity such as setRecentActivity([]) or
setActivity([])) so recent activity is cleared whenever orgId is
null/undefined/empty.
- Around line 67-71: In useMetricsData's useEffect (the effect that depends on
[orgId, startDate, endDate]) ensure that when orgId is falsy you clear the
hook's cached metric state instead of simply skipping fetchOverview; add an else
branch that resets the relevant state variables (e.g., metrics/overview state,
loading/error flags) so stale data from the previous org is removed on
logout/org switch; update the effect that currently calls fetchOverview() to
conditionally call fetchOverview() when orgId is present and otherwise call the
state-reset logic.
🧹 Nitpick comments (1)
backend/dashboard_metrics/tasks.py (1)
396-411: Avoid unusedobjfrom update_or_create.
Rename to_/_objto address RUF059 and reduce lint noise.♻️ Proposed cleanup
- obj, was_created = EventMetricsHourly._base_manager.update_or_create( + _, was_created = EventMetricsHourly._base_manager.update_or_create( organization_id=org_id, timestamp=hour_ts, metric_name=metric_name, project=project, tag=tag, defaults={ "metric_type": agg["metric_type"], "metric_value": agg["value"], "metric_count": agg["count"], "labels": agg["labels"], }, )- obj, was_created = EventMetricsDaily._base_manager.update_or_create( + _, was_created = EventMetricsDaily._base_manager.update_or_create( organization_id=org_id, date=date_val, metric_name=metric_name, project=project, tag=tag, defaults={ "metric_type": agg["metric_type"], "metric_value": agg["value"], "metric_count": agg["count"], "labels": agg["labels"], }, )- obj, was_created = EventMetricsMonthly._base_manager.update_or_create( + _, was_created = EventMetricsMonthly._base_manager.update_or_create( organization_id=org_id, month=month_val, metric_name=metric_name, project=project, tag=tag, defaults={ "metric_type": agg["metric_type"], "metric_value": agg["value"], "metric_count": agg["count"], "labels": agg["labels"], }, )Also applies to: 438-452, 479-492
| try: | ||
| for metric_name, query_method, is_histogram in metric_configs: | ||
| metric_type = MetricType.HISTOGRAM if is_histogram else MetricType.COUNTER | ||
|
|
||
| try: | ||
| # === HOURLY AGGREGATION (last 24 hours) === | ||
| hourly_results = query_method( | ||
| org_id, hourly_start, end_date, granularity="hour" | ||
| ) | ||
| for row in hourly_results: | ||
| period = row["period"] | ||
| value = row["value"] or 0 | ||
| hour_ts = _truncate_to_hour(period) | ||
| key = (org_id, hour_ts.isoformat(), metric_name, "default", None) | ||
|
|
||
| if key not in hourly_agg: | ||
| hourly_agg[key] = { | ||
| "metric_type": metric_type, | ||
| "value": 0, | ||
| "count": 0, | ||
| "labels": {}, | ||
| } | ||
| hourly_agg[key]["value"] += value | ||
| hourly_agg[key]["count"] += 1 | ||
|
|
||
| # === DAILY AGGREGATION (last 7 days) === | ||
| daily_results = query_method( | ||
| org_id, daily_start, end_date, granularity="day" | ||
| ) | ||
| for row in daily_results: | ||
| period = row["period"] | ||
| value = row["value"] or 0 | ||
| day_ts = _truncate_to_day(period) | ||
| key = ( | ||
| org_id, | ||
| day_ts.date().isoformat(), | ||
| metric_name, | ||
| "default", | ||
| None, | ||
| ) | ||
|
|
||
| if key not in daily_agg: | ||
| daily_agg[key] = { | ||
| "metric_type": metric_type, | ||
| "value": 0, | ||
| "count": 0, | ||
| "labels": {}, | ||
| } | ||
| daily_agg[key]["value"] += value | ||
| daily_agg[key]["count"] += 1 | ||
|
|
||
| # === MONTHLY AGGREGATION (last 2 months) === | ||
| monthly_results = query_method( | ||
| org_id, monthly_start, end_date, granularity="day" | ||
| ) | ||
| # Group results by month and create separate records | ||
| monthly_buckets: dict[str, dict] = {} | ||
| for row in monthly_results: | ||
| period = row["period"] | ||
| value = row["value"] or 0 | ||
| month_ts = _truncate_to_month(period) | ||
| month_key_str = month_ts.date().isoformat() | ||
|
|
||
| if month_key_str not in monthly_buckets: | ||
| monthly_buckets[month_key_str] = {"value": 0, "count": 0} | ||
| monthly_buckets[month_key_str]["value"] += value | ||
| monthly_buckets[month_key_str]["count"] += 1 | ||
|
|
||
| # Create aggregation entries for each month | ||
| for month_key_str, bucket in monthly_buckets.items(): | ||
| month_key = ( | ||
| org_id, | ||
| month_key_str, | ||
| metric_name, | ||
| "default", | ||
| None, | ||
| ) | ||
| monthly_agg[month_key] = { | ||
| "metric_type": metric_type, | ||
| "value": bucket["value"], | ||
| "count": bucket["count"] or 1, | ||
| "labels": {}, | ||
| } | ||
|
|
||
| except Exception as e: | ||
| logger.warning(f"Error querying {metric_name} for org {org_id}: {e}") | ||
| stats["errors"] += 1 | ||
|
|
||
| # Upsert all three tiers | ||
| if hourly_agg: | ||
| c, u = _upsert_hourly_replace(hourly_agg) | ||
| stats["hourly"]["created"] += c | ||
| stats["hourly"]["updated"] += u | ||
|
|
||
| if daily_agg: | ||
| c, u = _upsert_daily_replace(daily_agg) | ||
| stats["daily"]["created"] += c | ||
| stats["daily"]["updated"] += u | ||
|
|
||
| if monthly_agg: | ||
| c, u = _upsert_monthly_replace(monthly_agg) | ||
| stats["monthly"]["created"] += c | ||
| stats["monthly"]["updated"] += u | ||
|
|
||
| stats["orgs_processed"] += 1 | ||
|
|
||
| except Exception as e: | ||
| logger.exception(f"Error processing org {org_id}: {e}") | ||
| stats["errors"] += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
git ls-files backend/dashboard_metrics/tasks.pyRepository: Zipstack/unstract
Length of output: 95
🏁 Script executed:
wc -l backend/dashboard_metrics/tasks.pyRepository: Zipstack/unstract
Length of output: 99
🏁 Script executed:
sed -n '586,694p' backend/dashboard_metrics/tasks.pyRepository: Zipstack/unstract
Length of output: 4835
🏁 Script executed:
# Check for DatabaseError and OperationalError imports in the file
rg -n "DatabaseError|OperationalError" backend/dashboard_metrics/tasks.pyRepository: Zipstack/unstract
Length of output: 350
🏁 Script executed:
# Check for Celery task decorator and autoretry_for configuration
rg -n "autoretry_for|@.*task|@celery|@app\.task" backend/dashboard_metrics/tasks.pyRepository: Zipstack/unstract
Length of output: 358
🏁 Script executed:
# Find the function definition that contains lines 586-694
sed -n '500,520p' backend/dashboard_metrics/tasks.pyRepository: Zipstack/unstract
Length of output: 837
🏁 Script executed:
# Show lines around where the exception handlers are
sed -n '703,730p' backend/dashboard_metrics/tasks.pyRepository: Zipstack/unstract
Length of output: 966
🏁 Script executed:
# Get the exact line numbers for both exception handlers
sed -n '660,695p' backend/dashboard_metrics/tasks.pyRepository: Zipstack/unstract
Length of output: 1377
🏁 Script executed:
# Get more context to see where the outer try starts
sed -n '550,600p' backend/dashboard_metrics/tasks.pyRepository: Zipstack/unstract
Length of output: 2161
🏁 Script executed:
# Check what _upsert_hourly_replace, _upsert_daily_replace, _upsert_monthly_replace do
rg -n "_upsert_hourly_replace|_upsert_daily_replace|_upsert_monthly_replace" backend/dashboard_metrics/tasks.py | head -20Repository: Zipstack/unstract
Length of output: 459
🏁 Script executed:
# Get the implementation of these upsert functions
sed -n '300,500p' backend/dashboard_metrics/tasks.py | grep -A 30 "_upsert_hourly_replace\|_upsert_daily_replace\|_upsert_monthly_replace"Repository: Zipstack/unstract
Length of output: 3707
🏁 Script executed:
# Check if there are more exception handlers in the upsert functions
sed -n '373,420p' backend/dashboard_metrics/tasks.pyRepository: Zipstack/unstract
Length of output: 1896
🏁 Script executed:
# Look for transaction handling
rg -n "transaction.atomic" backend/dashboard_metrics/tasks.pyRepository: Zipstack/unstract
Length of output: 270
🏁 Script executed:
# Check the full outer exception handler context
sed -n '690,700p' backend/dashboard_metrics/tasks.pyRepository: Zipstack/unstract
Length of output: 556
🏁 Script executed:
# Verify the exact line numbers of the exception handlers in the reviewed code
sed -n '586,694p' backend/dashboard_metrics/tasks.py | grep -n "except"Repository: Zipstack/unstract
Length of output: 209
🏁 Script executed:
# Check if DatabaseError and OperationalError are actually caught by Exception
python3 << 'EOF'
from django.db.utils import DatabaseError, OperationalError
# DatabaseError and OperationalError inherit from Exception
print(f"DatabaseError is subclass of Exception: {issubclass(DatabaseError, Exception)}")
print(f"OperationalError is subclass of Exception: {issubclass(OperationalError, Exception)}")
print(f"DatabaseError MRO: {DatabaseError.__mro__}")
print(f"OperationalError MRO: {OperationalError.__mro__}")
EOFRepository: Zipstack/unstract
Length of output: 179
🏁 Script executed:
# Search for documentation on Django DatabaseError and OperationalError inheritance
web_search "Django DatabaseError OperationalError exception hierarchy"Repository: Zipstack/unstract
Length of output: 111
🏁 Script executed:
# Look at the full context of what happens between the inner and outer exception handlers
sed -n '670,694p' backend/dashboard_metrics/tasks.pyRepository: Zipstack/unstract
Length of output: 988
🏁 Script executed:
# Check the upsert functions' exception handling to understand the full flow
sed -n '410,420p' backend/dashboard_metrics/tasks.pyRepository: Zipstack/unstract
Length of output: 333
🏁 Script executed:
# Verify the structure of the task and see if there are any other database operations
rg -n "EventMetrics|update_or_create|transaction" backend/dashboard_metrics/tasks.py | head -30Repository: Zipstack/unstract
Length of output: 1153
🏁 Script executed:
# Get the exact upsert exception handling
sed -n '415,420p' backend/dashboard_metrics/tasks.pyRepository: Zipstack/unstract
Length of output: 195
🏁 Script executed:
# Verify the code structure around the outer exception handler to confirm the flow
sed -n '572,600p' backend/dashboard_metrics/tasks.pyRepository: Zipstack/unstract
Length of output: 1170
🏁 Script executed:
# Final confirmation: show the task definition with autoretry_for
sed -n '503,512p' backend/dashboard_metrics/tasks.pyRepository: Zipstack/unstract
Length of output: 393
Let DB errors propagate to Celery's autoretry mechanism.
The task has autoretry_for=(DatabaseError, OperationalError) configured, but the nested except Exception handlers at lines 671 and 693 catch these errors before they can trigger Celery's automatic retry. This means transient database errors won't be retried, and partial aggregates can be committed. Separate database errors from generic exceptions and re-raise them.
Suggested fix
- except Exception as e:
- logger.warning(f"Error querying {metric_name} for org {org_id}: {e}")
- stats["errors"] += 1
+ except (DatabaseError, OperationalError):
+ logger.exception(
+ "DB error querying %s for org %s", metric_name, org_id
+ )
+ raise
+ except Exception as e:
+ logger.warning(
+ "Error querying %s for org %s: %s", metric_name, org_id, e
+ )
+ stats["errors"] += 1
@@
- except Exception as e:
- logger.exception(f"Error processing org {org_id}: {e}")
- stats["errors"] += 1
+ except (DatabaseError, OperationalError):
+ logger.exception("DB error processing org %s", org_id)
+ raise
+ except Exception as e:
+ logger.exception("Error processing org %s: %s", org_id, e)
+ stats["errors"] += 1🧰 Tools
🪛 Ruff (0.14.14)
[warning] 670-670: Do not catch blind exception: Exception
(BLE001)
[warning] 693-693: Redundant exception object included in logging.exception call
(TRY401)
🤖 Prompt for AI Agents
In `@backend/dashboard_metrics/tasks.py` around lines 586 - 694, The exception
handlers inside the org-processing block are currently catching all Exceptions
(the inner try around the metric loop and the outer try wrapping org
processing), which swallows DatabaseError/OperationalError and prevents Celery's
autoretry; change both handlers to re-raise DB errors so Celery can retry: in
the inner except around the metric loop (where logger.warning(...) and
stats["errors"] += 1 happen), detect and re-raise DatabaseError and
OperationalError before handling other exceptions, and in the outer except
(where logger.exception(...) and stats["errors"] += 1 happen) do the same; keep
the existing logging/stats increments for non-DB exceptions and ensure
_upsert_hourly_replace, _upsert_daily_replace, and _upsert_monthly_replace DB
errors will propagate.
| // Filter to only show selected metrics that exist | ||
| const metricsToShow = useMemo(() => { | ||
| return selectedMetrics.filter((m) => availableMetrics.includes(m)); | ||
| }, [selectedMetrics, availableMetrics]); | ||
|
|
||
| if (loading) { | ||
| return ( | ||
| <Card | ||
| title="Trend Analysis (Last 30 Days)" | ||
| className="metrics-chart-card" | ||
| > | ||
| <div className="metrics-loading"> | ||
| <Spin /> | ||
| </div> | ||
| </Card> | ||
| ); | ||
| } | ||
|
|
||
| if (!chartData.length) { | ||
| return ( | ||
| <Card | ||
| title="Trend Analysis (Last 30 Days)" | ||
| className="metrics-chart-card" | ||
| > | ||
| <Empty description="No data available" /> | ||
| </Card> | ||
| ); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle “no selected metrics” to avoid a blank chart.
If selectedMetrics don’t exist in availableMetrics, metricsToShow becomes empty and the chart renders without lines. Show an explicit empty state so users understand what happened.
✅ Suggested fix
if (!chartData.length) {
return (
<Card
title="Trend Analysis (Last 30 Days)"
className="metrics-chart-card"
>
<Empty description="No data available" />
</Card>
);
}
+
+ if (!metricsToShow.length) {
+ return (
+ <Card
+ title="Trend Analysis (Last 30 Days)"
+ className="metrics-chart-card"
+ >
+ <Empty description="No metrics selected" />
+ </Card>
+ );
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Filter to only show selected metrics that exist | |
| const metricsToShow = useMemo(() => { | |
| return selectedMetrics.filter((m) => availableMetrics.includes(m)); | |
| }, [selectedMetrics, availableMetrics]); | |
| if (loading) { | |
| return ( | |
| <Card | |
| title="Trend Analysis (Last 30 Days)" | |
| className="metrics-chart-card" | |
| > | |
| <div className="metrics-loading"> | |
| <Spin /> | |
| </div> | |
| </Card> | |
| ); | |
| } | |
| if (!chartData.length) { | |
| return ( | |
| <Card | |
| title="Trend Analysis (Last 30 Days)" | |
| className="metrics-chart-card" | |
| > | |
| <Empty description="No data available" /> | |
| </Card> | |
| ); | |
| } | |
| // Filter to only show selected metrics that exist | |
| const metricsToShow = useMemo(() => { | |
| return selectedMetrics.filter((m) => availableMetrics.includes(m)); | |
| }, [selectedMetrics, availableMetrics]); | |
| if (loading) { | |
| return ( | |
| <Card | |
| title="Trend Analysis (Last 30 Days)" | |
| className="metrics-chart-card" | |
| > | |
| <div className="metrics-loading"> | |
| <Spin /> | |
| </div> | |
| </Card> | |
| ); | |
| } | |
| if (!chartData.length) { | |
| return ( | |
| <Card | |
| title="Trend Analysis (Last 30 Days)" | |
| className="metrics-chart-card" | |
| > | |
| <Empty description="No data available" /> | |
| </Card> | |
| ); | |
| } | |
| if (!metricsToShow.length) { | |
| return ( | |
| <Card | |
| title="Trend Analysis (Last 30 Days)" | |
| className="metrics-chart-card" | |
| > | |
| <Empty description="No metrics selected" /> | |
| </Card> | |
| ); | |
| } |
🤖 Prompt for AI Agents
In `@frontend/src/components/metrics-dashboard/MetricsChart.jsx` around lines 258
- 286, The chart can render blank when metricsToShow (computed from
selectedMetrics and availableMetrics) is empty; add an explicit early return
that checks metricsToShow.length === 0 and renders the same Card used for other
empty states (e.g., title "Trend Analysis (Last 30 Days)" with an Empty
component and a helpful description like "No metrics selected or available")
before the chart rendering logic so users see a clear empty state instead of a
blank chart.
| useEffect(() => { | ||
| if (orgId) { | ||
| fetchOverview(); | ||
| } | ||
| }, [orgId, startDate, endDate]); // eslint-disable-line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clear cached UI state when orgId is missing to avoid cross‑org data bleed.
When orgId becomes falsy, the hook skips fetching but keeps the previous org’s data in state. This can surface stale metrics after logout/org switch. Reset state in the useEffect else branch.
✅ Suggested fix
useEffect(() => {
if (orgId) {
fetchOverview();
+ } else {
+ setData(null);
+ setError(null);
+ setLoading(false);
}
}, [orgId, startDate, endDate]); // eslint-disable-line📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| useEffect(() => { | |
| if (orgId) { | |
| fetchOverview(); | |
| } | |
| }, [orgId, startDate, endDate]); // eslint-disable-line | |
| useEffect(() => { | |
| if (orgId) { | |
| fetchOverview(); | |
| } else { | |
| setData(null); | |
| setError(null); | |
| setLoading(false); | |
| } | |
| }, [orgId, startDate, endDate]); // eslint-disable-line |
🤖 Prompt for AI Agents
In `@frontend/src/hooks/useMetricsData.js` around lines 67 - 71, In
useMetricsData's useEffect (the effect that depends on [orgId, startDate,
endDate]) ensure that when orgId is falsy you clear the hook's cached metric
state instead of simply skipping fetchOverview; add an else branch that resets
the relevant state variables (e.g., metrics/overview state, loading/error flags)
so stale data from the previous org is removed on logout/org switch; update the
effect that currently calls fetchOverview() to conditionally call
fetchOverview() when orgId is present and otherwise call the state-reset logic.
| useEffect(() => { | ||
| if (orgId) { | ||
| fetchActivity(); | ||
| } | ||
| }, [orgId, limit]); // eslint-disable-line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reset recent activity state when orgId is unavailable.
Same issue as overview: stale activity can remain visible if orgId drops to null. Clear state in the effect when orgId is falsy.
✅ Suggested fix
useEffect(() => {
if (orgId) {
fetchActivity();
+ } else {
+ setData(null);
+ setError(null);
+ setLoading(false);
}
}, [orgId, limit]); // eslint-disable-line🤖 Prompt for AI Agents
In `@frontend/src/hooks/useMetricsData.js` around lines 114 - 118, When orgId
becomes falsy the effect should clear the recent-activity state to avoid showing
stale data; inside the useEffect that currently calls fetchActivity() when orgId
is truthy (the effect depending on [orgId, limit]), add an else branch that
resets your activity state (e.g., call the state updater used to store recent
activity such as setRecentActivity([]) or setActivity([])) so recent activity is
cleared whenever orgId is null/undefined/empty.
- Implement multi-tier metrics aggregation with celery-batches - Add hourly/daily/monthly storage tables with proper indexing - Add dedicated metrics Celery worker (worker-metrics) - Add live query endpoints for immediate data access - Add production readiness: retry config, rate limiting, health checks - Add queue routing and broker resilience configuration - Add comprehensive unit tests for batch processing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
for more information, see https://pre-commit.ci
Backend: - Add EventMetricsHourly/Daily/Monthly models for pre-aggregated data - Add MetricsQueryService for querying source tables (Usage, PageUsage) - Add aggregate_from_sources Celery task (runs every 15 min) - Add cleanup tasks for hourly (30 days) and daily (365 days) retention - Add /overview/, /usage/, /activity/ API endpoints - Fix Celery task queue routing to dashboard_metric_events - Use _base_manager to bypass org context filter in Celery tasks Frontend: - Add MetricsDashboard component with tabs (Overview, Usage, Activity) - Add MetricsChart, MetricsSummary, MetricsTable components - Add useMetricsData hook with caching - Add recharts dependency for visualizations - Add dashboard.svg icon and sidebar navigation - Support date range picker for all tabs Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
for more information, see https://pre-commit.ci
- Remove redundant Detailed View tab and MetricsTable component - Add Recent Activity component showing real-time file executions - Differentiates by type: ETL, API, Workflow - Shows status, file name, and relative timestamps - Convert charts to line charts for 30-day visualization - Create PagesChart (pages_processed + failed_pages) - Create TrendAnalysisChart (other metrics with filter dropdown) - Date picker now only affects summary cards, charts fixed at 30 days - Add recent-activity API endpoint with proper URL registration - Fix organization filter queries in services.py Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Reduce default limit from 10 to 5 items - Add max-height with overflow scroll for compact display Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Cleanup after dashboard refactoring: - Remove unused chart components (MetricsChart, MetricsBreakdown, MultiSeriesChart, MetricsComparison) - Remove unused hooks (useMetricsData, useMetricsSummary, useMetricsSeries) - Remove unused imports (BarChart, Bar, Progress) - Remove unused CSS styles (.usage-quota-*, .trend-*, .breakdown-*) - Update exports to only include used components - Add failed_pages to metrics aggregation task Reduces bundle size by ~982 lines of dead code. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
acb9e9c to
ce6bba3
Compare
Test ResultsSummary
Runner Tests - Full Report
SDK1 Tests - Full Report
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/pyproject.toml (1)
15-20:⚠️ Potential issue | 🟠 MajorUpdate celery-batches to 0.10 or later for Celery 5.3.x and Python 3.12 compatibility.
celery-batches 0.9.0 targets Celery 5.0–5.1 and does not support Celery 5.3.x. For Celery 5.3.x and Python 3.12, use
celery-batches>=0.10(or latest 0.11). No known security advisories exist for this package.
🤖 Fix all issues with AI agents
In `@backend/dashboard_metrics/management/commands/backfill_metrics.py`:
- Around line 202-256: The code currently increments the per-key "count" by 1 in
hourly_agg/daily_agg/monthly_agg which yields bucket counts instead of
source-record counts; update the increments in the loops in backfill_metrics.py
(the blocks that update hourly_agg[key]["count"], daily_agg[key]["count"], and
monthly_agg[key]["count"]) to use an explicit row-level count value from the
query result (e.g., row.get("count")) falling back to 1 if missing: set
row_count = row.get("count", None) and then hourly_agg[key]["count"] +=
row_count if row_count is not None else (value if metric_type == "counter" else
1); apply the same change for the daily and monthly aggregation loops so
averages use source-record counts (and confirm MetricsQueryService returns a
"count" field).
In `@backend/dashboard_metrics/tasks.py`:
- Around line 121-129: The event handler is passing a None tag into model
upserts which conflicts with the EventMetrics* CharField (null=False); normalize
tag to a non-null string (e.g., "") right after reading it from the event
(replace tag = event.get("tag") with a normalized value) and use that normalized
variable in all subsequent get_or_create()/update_or_create() calls so the model
default is not overridden by None.
In `@frontend/src/components/metrics-dashboard/MetricsDashboard.jsx`:
- Around line 26-70: chartStart/chartEnd are frozen by useMemo at mount so
refetchChart() keeps using stale dates; update handleRefresh to compute a fresh
30-day range and pass it to the chart refetch, e.g. compute const newStart =
dayjs().subtract(30, "day").toISOString() and const newEnd =
dayjs().toISOString() inside handleRefresh and call refetchChart(newStart,
newEnd). If refetchChart currently doesn't accept parameters, modify
useMetricsOverview(chartStart, chartEnd) to return a refetch that accepts
optional (start,end) overrides and use those in handleRefresh; touch the symbols
chartStart, chartEnd, refetchChart, useMetricsOverview, and handleRefresh.
In `@frontend/src/components/metrics-dashboard/RecentActivity.jsx`:
- Around line 91-125: The List rendering of dataSource={data.activity} lacks a
stable key causing reconciliation warnings; update the List component (the one
using dataSource, renderItem, and STATUS_CONFIG/TYPE_CONFIG) to include
rowKey="id" so React uses each activity's stable id field for keys (ensure the
id property on items is used as the unique key).
🧹 Nitpick comments (4)
unstract/core/src/unstract/core/metrics/registry.py (1)
42-48: Return a stable order fromget_all_metrics().Converting a set to a list yields nondeterministic ordering; sorting avoids flaky output and makes results predictable.
♻️ Suggested fix
- return list(_REGISTERED_METRICS) + return sorted(_REGISTERED_METRICS)backend/backend/celery_config.py (1)
1-74: Annotate mutable class attributes asClassVarto satisfy Ruff.
This clears RUF012 warnings and makes the config intent explicit.♻️ Suggested fix
-from urllib.parse import quote_plus +from urllib.parse import quote_plus +from typing import ClassVar from django.conf import settings from kombu import Queue @@ - broker_transport_options = { + broker_transport_options: ClassVar[dict[str, object]] = { "max_retries": 3, "interval_start": 0, "interval_step": 0.2, "interval_max": 1, "socket_timeout": 30, "socket_keepalive": True, } @@ - task_queues = [ + task_queues: ClassVar[list[Queue]] = [ Queue("celery", routing_key="celery"), Queue("dashboard_metric_events", routing_key="dashboard_metric_events"), Queue("celery_periodic_logs", routing_key="celery_periodic_logs"), Queue("celery_log_task_queue", routing_key="celery_log_task_queue"), Queue("celery_api_deployments", routing_key="celery_api_deployments"), Queue("file_processing", routing_key="file_processing"), Queue("api_file_processing", routing_key="api_file_processing"), Queue("file_processing_callback", routing_key="file_processing_callback"), Queue("api_file_processing_callback", routing_key="api_file_processing_callback"), ] @@ - task_routes = { + task_routes: ClassVar[dict[str, dict[str, str]]] = { "dashboard_metrics.process_events": {"queue": "dashboard_metric_events"}, "dashboard_metrics.aggregate_from_sources": {"queue": "dashboard_metric_events"}, "dashboard_metrics.cleanup_hourly_data": {"queue": "dashboard_metric_events"}, "dashboard_metrics.cleanup_daily_data": {"queue": "dashboard_metric_events"}, }backend/dashboard_metrics/capture.py (1)
79-88: Avoid mutating caller-suppliedlabels.
labels or {}followed by in-place updates will modify the caller’s dict. Copy before extending and apply the same pattern in all record_* methods.♻️ Suggested pattern
- metric_labels = labels or {} + metric_labels = dict(labels) if labels else {}backend/dashboard_metrics/management/commands/populate_test_metrics.py (1)
53-77: Consider chaining the original exception for better debugging.When re-raising
CommandError, chain the original exception to preserve the stack trace.🔧 Suggested improvement
try: org = Organization.objects.get(id=org_id) except Organization.DoesNotExist: - raise CommandError(f"Organization with ID {org_id} not found") + raise CommandError(f"Organization with ID {org_id} not found") from None
| for row in hourly_results: | ||
| period = row["period"] | ||
| value = row["value"] or 0 | ||
| hour_ts = self._truncate_to_hour(period) | ||
| key = (org_id, hour_ts.isoformat(), metric_name, "default", None) | ||
|
|
||
| if key not in hourly_agg: | ||
| hourly_agg[key] = { | ||
| "metric_type": metric_type, | ||
| "value": 0, | ||
| "count": 0, | ||
| "labels": {}, | ||
| } | ||
| hourly_agg[key]["value"] += value | ||
| hourly_agg[key]["count"] += 1 | ||
|
|
||
| # Query daily data | ||
| daily_results = query_method( | ||
| org_id, start_date, end_date, granularity="day" | ||
| ) | ||
| for row in daily_results: | ||
| period = row["period"] | ||
| value = row["value"] or 0 | ||
| day_date = period.date() if hasattr(period, "date") else period | ||
| key = (org_id, day_date.isoformat(), metric_name, "default", None) | ||
|
|
||
| if key not in daily_agg: | ||
| daily_agg[key] = { | ||
| "metric_type": metric_type, | ||
| "value": 0, | ||
| "count": 0, | ||
| "labels": {}, | ||
| } | ||
| daily_agg[key]["value"] += value | ||
| daily_agg[key]["count"] += 1 | ||
|
|
||
| # Query for monthly (aggregate daily results by month) | ||
| for row in daily_results: | ||
| period = row["period"] | ||
| value = row["value"] or 0 | ||
| if hasattr(period, "date"): | ||
| month_date = period.replace(day=1).date() | ||
| else: | ||
| month_date = period.replace(day=1) | ||
| key = (org_id, month_date.isoformat(), metric_name, "default", None) | ||
|
|
||
| if key not in monthly_agg: | ||
| monthly_agg[key] = { | ||
| "metric_type": metric_type, | ||
| "value": 0, | ||
| "count": 0, | ||
| "labels": {}, | ||
| } | ||
| monthly_agg[key]["value"] += value | ||
| monthly_agg[key]["count"] += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verify metric_count represents source-record counts, not bucket counts.
Right now metric_count increments by 1 per aggregated row. If MetricsQueryService already returns per-period aggregates (likely, given the granularity parameter), this makes metric_count equal the number of time buckets, which will skew averages (total_value / metric_count). Please confirm the query shape and, if it already aggregates, use an explicit row count (or for counter metrics, the aggregated count) instead of +1.
Suggested adjustment (use a row-level count when available)
- value = row["value"] or 0
+ value = row["value"] or 0
+ row_count = row.get("count", 1)
hour_ts = self._truncate_to_hour(period)
key = (org_id, hour_ts.isoformat(), metric_name, "default", None)
@@
- hourly_agg[key]["count"] += 1
+ hourly_agg[key]["count"] += row_count🤖 Prompt for AI Agents
In `@backend/dashboard_metrics/management/commands/backfill_metrics.py` around
lines 202 - 256, The code currently increments the per-key "count" by 1 in
hourly_agg/daily_agg/monthly_agg which yields bucket counts instead of
source-record counts; update the increments in the loops in backfill_metrics.py
(the blocks that update hourly_agg[key]["count"], daily_agg[key]["count"], and
monthly_agg[key]["count"]) to use an explicit row-level count value from the
query result (e.g., row.get("count")) falling back to 1 if missing: set
row_count = row.get("count", None) and then hourly_agg[key]["count"] +=
row_count if row_count is not None else (value if metric_type == "counter" else
1); apply the same change for the daily and monthly aggregation loops so
averages use source-record counts (and confirm MetricsQueryService returns a
"count" field).
| timestamp = event.get("timestamp", timezone.now().timestamp()) | ||
| org_id = event["org_id"] | ||
| metric_name = event["metric_name"] | ||
| metric_value = event.get("metric_value", 1) | ||
| metric_type = event.get("metric_type", MetricType.COUNTER) | ||
| labels = event.get("labels", {}) | ||
| project = event.get("project", "default") | ||
| tag = event.get("tag") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
find . -name "*.py" -path "*/dashboard_metrics/*" -type f | head -20Repository: Zipstack/unstract
Length of output: 1073
🏁 Script executed:
cat -n backend/dashboard_metrics/tasks.py | sed -n '110,135p'Repository: Zipstack/unstract
Length of output: 1228
🏁 Script executed:
find . -name "*.py" -path "*/dashboard_metrics/*" | xargs grep -l "class EventMetrics"Repository: Zipstack/unstract
Length of output: 141
🏁 Script executed:
cat -n backend/dashboard_metrics/models.py | head -150Repository: Zipstack/unstract
Length of output: 5707
🏁 Script executed:
grep -n "class EventMetrics" backend/dashboard_metrics/models.pyRepository: Zipstack/unstract
Length of output: 467
🏁 Script executed:
cat -n backend/dashboard_metrics/tasks.py | sed -n '128,160p'Repository: Zipstack/unstract
Length of output: 1593
🏁 Script executed:
grep -n "update_or_create\|get_or_create" backend/dashboard_metrics/tasks.pyRepository: Zipstack/unstract
Length of output: 588
🏁 Script executed:
cat -n backend/dashboard_metrics/tasks.py | sed -n '225,245p'Repository: Zipstack/unstract
Length of output: 1037
🏁 Script executed:
cat -n backend/dashboard_metrics/tasks.py | sed -n '395,415p'Repository: Zipstack/unstract
Length of output: 1028
🏁 Script executed:
cat -n backend/dashboard_metrics/tasks.py | sed -n '279,299p'Repository: Zipstack/unstract
Length of output: 1037
🏁 Script executed:
cat -n backend/dashboard_metrics/models.py | sed -n '145,210p'Repository: Zipstack/unstract
Length of output: 2652
🏁 Script executed:
cat -n backend/dashboard_metrics/models.py | sed -n '251,320p'Repository: Zipstack/unstract
Length of output: 2799
🏁 Script executed:
cat -n backend/dashboard_metrics/tasks.py | sed -n '333,350p'Repository: Zipstack/unstract
Length of output: 935
🏁 Script executed:
cat -n backend/dashboard_metrics/tasks.py | sed -n '220,225p'Repository: Zipstack/unstract
Length of output: 338
🏁 Script executed:
grep -r "IntegrityError" backend/dashboard_metrics/ --include="*.py"Repository: Zipstack/unstract
Length of output: 43
🏁 Script executed:
cat -n backend/dashboard_metrics/tests/test_tasks.py | head -100Repository: Zipstack/unstract
Length of output: 3906
🏁 Script executed:
grep -n "tag" backend/dashboard_metrics/tests/test_tasks.py | head -30Repository: Zipstack/unstract
Length of output: 267
🏁 Script executed:
cat -n backend/dashboard_metrics/tasks.py | sed -n '100,130p'Repository: Zipstack/unstract
Length of output: 1345
Normalize tag to a non-null value before upserts.
The EventMetrics* models define tag as a CharField with null=False (implicit) and default="". When event["tag"] is missing, event.get("tag") returns None, which is passed explicitly to get_or_create() and update_or_create() at lines 234, 288, 342, and 403. This overrides the model default and causes an IntegrityError when Django attempts to persist None to the NOT NULL database column, dropping events.
Suggested fix
- tag = event.get("tag")
+ tag = event.get("tag") or ""🤖 Prompt for AI Agents
In `@backend/dashboard_metrics/tasks.py` around lines 121 - 129, The event handler
is passing a None tag into model upserts which conflicts with the EventMetrics*
CharField (null=False); normalize tag to a non-null string (e.g., "") right
after reading it from the event (replace tag = event.get("tag") with a
normalized value) and use that normalized variable in all subsequent
get_or_create()/update_or_create() calls so the model default is not overridden
by None.
| // Fixed 30-day range for activity charts (independent of date picker) | ||
| const chartStart = useMemo( | ||
| () => dayjs().subtract(30, "day").toISOString(), | ||
| [] | ||
| ); | ||
| const chartEnd = useMemo(() => dayjs().toISOString(), []); | ||
|
|
||
| // API hooks - pass date range to overview (for summary cards) | ||
| const { | ||
| data: overviewData, | ||
| loading: overviewLoading, | ||
| error: overviewError, | ||
| refetch: refetchOverview, | ||
| } = useMetricsOverview( | ||
| dateRange[0]?.toISOString(), | ||
| dateRange[1]?.toISOString() | ||
| ); | ||
|
|
||
| // Fixed 30-day data for charts | ||
| const { | ||
| data: chartData, | ||
| loading: chartLoading, | ||
| refetch: refetchChart, | ||
| } = useMetricsOverview(chartStart, chartEnd); | ||
|
|
||
| // Recent activity (real-time, no date filter) | ||
| const { | ||
| data: activityData, | ||
| loading: activityLoading, | ||
| refetch: refetchActivity, | ||
| } = useRecentActivity(5); | ||
|
|
||
| // Handle date range change | ||
| const handleDateChange = useCallback((dates) => { | ||
| if (dates && dates.length === 2) { | ||
| setDateRange(dates); | ||
| } | ||
| }, []); | ||
|
|
||
| // Handle refresh | ||
| const handleRefresh = useCallback(() => { | ||
| refetchOverview(); | ||
| refetchChart(); | ||
| refetchActivity(); | ||
| }, [refetchOverview, refetchChart, refetchActivity]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Find the useMetricsOverview hook
fd -t f "useMetricsOverview" --type fRepository: Zipstack/unstract
Length of output: 43
🏁 Script executed:
# Search for useMetricsOverview in the codebase
rg "useMetricsOverview" -t jsx -t js -A 5 -B 2Repository: Zipstack/unstract
Length of output: 88
🏁 Script executed:
# Look for hooks directory
fd -t d "hooks" frontend/src --max-depth 3Repository: Zipstack/unstract
Length of output: 80
🏁 Script executed:
# Search for useMetricsOverview in hooks
rg "useMetricsOverview" frontend/src/hooks/ -A 10Repository: Zipstack/unstract
Length of output: 1011
🏁 Script executed:
# Also search the entire frontend directory
rg "useMetricsOverview" frontend/ -A 10Repository: Zipstack/unstract
Length of output: 3900
🏁 Script executed:
# Read the full useMetricsData.js file
cat frontend/src/hooks/useMetricsData.jsRepository: Zipstack/unstract
Length of output: 3758
🏁 Script executed:
# Search for MetricsDashboard tests
fd -t f "MetricsDashboard" frontend/ | grep -i testRepository: Zipstack/unstract
Length of output: 43
🏁 Script executed:
# Check for any documentation or comments about rolling vs fixed windows
rg "rolling|30.day|30 day" frontend/src/components/metrics-dashboard/ -iRepository: Zipstack/unstract
Length of output: 1133
🏁 Script executed:
# Read MetricsDashboard.jsx to see full context
sed -n '1,100p' frontend/src/components/metrics-dashboard/MetricsDashboard.jsxRepository: Zipstack/unstract
Length of output: 3022
Refresh doesn't advance the 30-day chart window.
chartStart/chartEnd are frozen at mount with empty dependency arrays, so refetchChart() reuses the same range. Since the UI labels ("Last 30 Days") and presets indicate a rolling window, the chart data should reflect current last-30-days on refresh, not dates from component mount time.
Update the range on refresh to align with user expectations:
Suggested fix
- // Fixed 30-day range for activity charts (independent of date picker)
- const chartStart = useMemo(
- () => dayjs().subtract(30, "day").toISOString(),
- []
- );
- const chartEnd = useMemo(() => dayjs().toISOString(), []);
+ // Rolling 30-day range for activity charts (independent of date picker)
+ const [chartRange, setChartRange] = useState(() => [
+ dayjs().subtract(30, "day"),
+ dayjs(),
+ ]);
+ const chartStart = useMemo(
+ () => chartRange[0].toISOString(),
+ [chartRange]
+ );
+ const chartEnd = useMemo(() => chartRange[1].toISOString(), [chartRange]);
@@
- const handleRefresh = useCallback(() => {
- refetchOverview();
- refetchChart();
- refetchActivity();
- }, [refetchOverview, refetchChart, refetchActivity]);
+ const handleRefresh = useCallback(() => {
+ setChartRange([dayjs().subtract(30, "day"), dayjs()]);
+ refetchOverview();
+ refetchChart();
+ refetchActivity();
+ }, [refetchOverview, refetchChart, refetchActivity]);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Fixed 30-day range for activity charts (independent of date picker) | |
| const chartStart = useMemo( | |
| () => dayjs().subtract(30, "day").toISOString(), | |
| [] | |
| ); | |
| const chartEnd = useMemo(() => dayjs().toISOString(), []); | |
| // API hooks - pass date range to overview (for summary cards) | |
| const { | |
| data: overviewData, | |
| loading: overviewLoading, | |
| error: overviewError, | |
| refetch: refetchOverview, | |
| } = useMetricsOverview( | |
| dateRange[0]?.toISOString(), | |
| dateRange[1]?.toISOString() | |
| ); | |
| // Fixed 30-day data for charts | |
| const { | |
| data: chartData, | |
| loading: chartLoading, | |
| refetch: refetchChart, | |
| } = useMetricsOverview(chartStart, chartEnd); | |
| // Recent activity (real-time, no date filter) | |
| const { | |
| data: activityData, | |
| loading: activityLoading, | |
| refetch: refetchActivity, | |
| } = useRecentActivity(5); | |
| // Handle date range change | |
| const handleDateChange = useCallback((dates) => { | |
| if (dates && dates.length === 2) { | |
| setDateRange(dates); | |
| } | |
| }, []); | |
| // Handle refresh | |
| const handleRefresh = useCallback(() => { | |
| refetchOverview(); | |
| refetchChart(); | |
| refetchActivity(); | |
| }, [refetchOverview, refetchChart, refetchActivity]); | |
| // Rolling 30-day range for activity charts (independent of date picker) | |
| const [chartRange, setChartRange] = useState(() => [ | |
| dayjs().subtract(30, "day"), | |
| dayjs(), | |
| ]); | |
| const chartStart = useMemo( | |
| () => chartRange[0].toISOString(), | |
| [chartRange] | |
| ); | |
| const chartEnd = useMemo(() => chartRange[1].toISOString(), [chartRange]); | |
| // API hooks - pass date range to overview (for summary cards) | |
| const { | |
| data: overviewData, | |
| loading: overviewLoading, | |
| error: overviewError, | |
| refetch: refetchOverview, | |
| } = useMetricsOverview( | |
| dateRange[0]?.toISOString(), | |
| dateRange[1]?.toISOString() | |
| ); | |
| // Fixed 30-day data for charts | |
| const { | |
| data: chartData, | |
| loading: chartLoading, | |
| refetch: refetchChart, | |
| } = useMetricsOverview(chartStart, chartEnd); | |
| // Recent activity (real-time, no date filter) | |
| const { | |
| data: activityData, | |
| loading: activityLoading, | |
| refetch: refetchActivity, | |
| } = useRecentActivity(5); | |
| // Handle date range change | |
| const handleDateChange = useCallback((dates) => { | |
| if (dates && dates.length === 2) { | |
| setDateRange(dates); | |
| } | |
| }, []); | |
| // Handle refresh | |
| const handleRefresh = useCallback(() => { | |
| setChartRange([dayjs().subtract(30, "day"), dayjs()]); | |
| refetchOverview(); | |
| refetchChart(); | |
| refetchActivity(); | |
| }, [refetchOverview, refetchChart, refetchActivity]); |
🤖 Prompt for AI Agents
In `@frontend/src/components/metrics-dashboard/MetricsDashboard.jsx` around lines
26 - 70, chartStart/chartEnd are frozen by useMemo at mount so refetchChart()
keeps using stale dates; update handleRefresh to compute a fresh 30-day range
and pass it to the chart refetch, e.g. compute const newStart =
dayjs().subtract(30, "day").toISOString() and const newEnd =
dayjs().toISOString() inside handleRefresh and call refetchChart(newStart,
newEnd). If refetchChart currently doesn't accept parameters, modify
useMetricsOverview(chartStart, chartEnd) to return a refetch that accepts
optional (start,end) overrides and use those in handleRefresh; touch the symbols
chartStart, chartEnd, refetchChart, useMetricsOverview, and handleRefresh.
| <List | ||
| className="recent-activity-list" | ||
| dataSource={data.activity} | ||
| size="small" | ||
| split={false} | ||
| renderItem={(item) => { | ||
| const statusConfig = | ||
| STATUS_CONFIG[item.status] || STATUS_CONFIG.QUEUED; | ||
| const typeConfig = TYPE_CONFIG[item.type] || TYPE_CONFIG.workflow; | ||
|
|
||
| return ( | ||
| <List.Item className="recent-activity-item"> | ||
| <div className="recent-activity-content"> | ||
| <div className="recent-activity-header"> | ||
| <span | ||
| className="recent-activity-type" | ||
| style={{ color: typeConfig.color }} | ||
| > | ||
| {typeConfig.icon} | ||
| <Text strong style={{ marginLeft: 6 }}> | ||
| {typeConfig.label} | ||
| </Text> | ||
| </span> | ||
| <Tag color={statusConfig.color}>{statusConfig.label}</Tag> | ||
| </div> | ||
| <Text className="recent-activity-file" ellipsis> | ||
| {item.file_name} | ||
| </Text> | ||
| <Text type="secondary" className="recent-activity-time"> | ||
| {dayjs(item.created_at).fromNow()} | ||
| </Text> | ||
| </div> | ||
| </List.Item> | ||
| ); | ||
| }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's locate and read the RecentActivity.jsx file
find . -name "RecentActivity.jsx" -type fRepository: Zipstack/unstract
Length of output: 123
🏁 Script executed:
# Search for the RecentActivity component and understand its context
rg "RecentActivity" --type jsx -lRepository: Zipstack/unstract
Length of output: 88
🌐 Web query:
Ant Design List component rowKey prop best practice React key usage renderItem
💡 Result:
Ant Design <List /> + rowKey: best practice (and how it relates to React key)
1) Prefer rowKey when you use dataSource
When you render an Ant Design List from dataSource, let Ant Design assign the React keys by providing rowKey (string field name or function). AntD’s API explicitly defines rowKey as the item’s unique value (a React.Key) and defaults to "key". [1]
Recommended
<List
dataSource={users}
rowKey="id" // stable, unique
renderItem={(user) => <List.Item>{user.name}</List.Item>}
/>or:
<List
dataSource={users}
rowKey={(user) => user.id} // stable, unique
renderItem={(user) => <List.Item>{user.name}</List.Item>}
/>If your items already have item.key, you can rely on the default (rowKey="key"). [1]
2) Don’t generate keys inside renderItem
Avoid patterns like key={Math.random()} / key={uuid()} during render—keys must be stable across renders for React’s reconciliation to work correctly. React recommends using a value that uniquely identifies the item among its siblings (often an ID from your data). [2]
3) Avoid array index keys except for truly static lists
React allows using the array index only as a last resort; it’s not recommended if items can be inserted/removed/reordered because it can cause incorrect UI/state reuse. [2]
4) If you don’t use dataSource, you must provide keys yourself
If you manually render children (instead of dataSource), then you’re responsible for key on the elements you map (normal React rule). [2]
{users.map((u) => (
<List.Item key={u.id}>{u.name}</List.Item>
))}Rule of thumb: for AntD List with dataSource, set rowKey to a stable unique ID; don’t also “fight it” by inventing keys inside renderItem.
Sources: Ant Design List API (rowKey) [1]; React “Lists and Keys” guidance [2].
🏁 Script executed:
# Read the RecentActivity.jsx file
cat -n ./frontend/src/components/metrics-dashboard/RecentActivity.jsx | head -150Repository: Zipstack/unstract
Length of output: 5010
🏁 Script executed:
# Search for where data.activity is defined or populated
rg "data\.activity|activity" frontend/src/components/metrics-dashboard/ -A 3Repository: Zipstack/unstract
Length of output: 7572
🏁 Script executed:
# Look for type definitions or data models related to activity
fd "activity|Activity" frontend/src --type ts --type tsx -o | head -20Repository: Zipstack/unstract
Length of output: 230
Add a stable rowKey to the List component.
The List component uses dataSource={data.activity} without specifying rowKey, which can trigger React reconciliation warnings and less efficient diffing. The activity items have a stable id field (confirmed in PropTypes); use rowKey="id" to ensure proper key-based rendering.
🛠️ Suggested fix
<List
className="recent-activity-list"
dataSource={data.activity}
size="small"
split={false}
+ rowKey="id"
renderItem={(item) => {🤖 Prompt for AI Agents
In `@frontend/src/components/metrics-dashboard/RecentActivity.jsx` around lines 91
- 125, The List rendering of dataSource={data.activity} lacks a stable key
causing reconciliation warnings; update the List component (the one using
dataSource, renderItem, and STATUS_CONFIG/TYPE_CONFIG) to include rowKey="id" so
React uses each activity's stable id field for keys (ensure the id property on
items is used as the unique key).




What
Why
How
backend/dashboard_metrics/Django app with models, views, services, and tasksunstract/core/src/unstract/core/metrics/library with queue backend using Kombudashboard_metric_eventsqueueCan this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)
Database Migrations
Env Config
Relevant Docs
Related Issues or PRs
Dependencies Versions
Notes on Testing
pytest backend/dashboard_metrics/tests/docker compose up worker-metricsGET /api/v2/metrics/health/Screenshots
N/A
Checklist
I have read and understood the Contribution Guidelines.