feat(monitoring): add prometheus metrics collection and grafana dashboard by KalpanaBhaskar · Pull Request #100 · fuzziecoder/Flexi-Roaster

KalpanaBhaskar · 2026-03-07T13:18:54Z

What:

Implemented comprehensive Prometheus-style metrics collection for pipeline and stage performance monitoring, and exported a pre-configured Grafana Dashboard JSON for visualization.

Why:

Resolves the issue requesting real-time visibility into pipeline throughput, success/failure rates, active workloads, and execution duration trends.

How:

Created the core registry [backend/monitoring/metrics.py] defining official prometheus_client Counters (total runs, failures), Histograms (duration distributions), and Gauges (active threads).
Instrumented the core backend execution engine in [backend/core/executor.py] to trigger and track metrics on pipeline/stage start, success, and failure.
Established the stateless web endpoint GET /metrics in [backend/main.py].
Generated [docs/monitoring/grafana_dashboard.json] pre-configured with 4 live panels tracking Success Rates, Durations, Active Pipelines, and Global Throughput.

Testing:

Verified GET /metrics text payload formatting in the browser.
Pushed test workloads via Swagger to confirm counters dynamically increment and decrement accurately without crashing the engine.

Addressing Issue #30

Summary by CodeRabbit

Release Notes

New Features
- Added comprehensive pipeline execution metrics tracking, including success/failure rates, execution duration, active pipeline counts, and throughput
- Introduced a monitoring dashboard for real-time visibility into pipeline performance and health
- Metrics now exposed via a dedicated endpoint for integration with external monitoring systems

…oard

vercel · 2026-03-07T13:18:58Z

@KalpanaBhaskar is attempting to deploy a commit to the Revon Team on Vercel.

A member of the Team first needs to authorize it.

coderabbitai · 2026-03-07T13:19:09Z

Important

Review skipped

Review was skipped due to path filters

⛔ Files ignored due to path filters (8)

backend/__pycache__/main.cpython-313.pyc is excluded by !**/*.pyc
backend/api/__pycache__/schemas.cpython-313.pyc is excluded by !**/*.pyc
backend/api/routes/__pycache__/ai.cpython-311.pyc is excluded by !**/*.pyc
backend/api/routes/__pycache__/ai.cpython-313.pyc is excluded by !**/*.pyc
backend/core/__pycache__/executor.cpython-313.pyc is excluded by !**/*.pyc
backend/db/__pycache__/crud.cpython-313.pyc is excluded by !**/*.pyc
backend/monitoring/__pycache__/metrics.cpython-311.pyc is excluded by !**/*.pyc
backend/monitoring/__pycache__/metrics.cpython-313.pyc is excluded by !**/*.pyc

CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including **/dist/** will override the default block on the dist directory, by removing the pattern from both the lists.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5ad3f248-4fa1-4b6e-bf26-a54749e84e3f

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR introduces Prometheus monitoring to the pipeline execution system. It adds metric definitions for execution tracking, instruments the executor with timing measurements, exposes metrics via a new /metrics endpoint, and provides a Grafana dashboard configuration for visualization.

Changes

Cohort / File(s)	Summary
Metrics Definitions `backend/monitoring/metrics.py`	Defines five Prometheus metrics: `pipeline_executions_total` (counter), `pipeline_failures_total` (counter), `pipeline_execution_duration_seconds` (histogram), `stage_execution_duration_seconds` (histogram), and `pipeline_active_executions` (gauge) with appropriate labels and buckets.
Executor Instrumentation `backend/core/executor.py`	Records timing and emits metrics on pipeline start, completion, failure; tracks stage execution duration and status; increments counters, observes histograms, and manages gauge for active executions.
Metrics Exposure `backend/main.py`	Adds `/metrics` endpoint that returns Prometheus-formatted metrics with appropriate HTTP response headers.
Dashboard Configuration `docs/monitoring/grafana_dashboard.json`	Defines a Grafana dashboard with four panels: pipeline success rate, active executions, execution duration (95th percentile), and global throughput, all querying Prometheus data source.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Executor
    participant Metrics
    participant Prometheus
    participant Grafana

    Client->>Executor: Start Pipeline Execution
    Executor->>Metrics: Increment pipeline_active_executions
    Executor->>Metrics: Record pipeline start_time
    
    rect rgba(100, 150, 200, 0.5)
        Note over Executor,Metrics: Pipeline Processing
        Executor->>Executor: Execute Stages
        Executor->>Metrics: Record stage start_time
        Executor->>Metrics: Observe stage_execution_duration_seconds
    end
    
    alt Pipeline Success
        Executor->>Metrics: Increment pipeline_executions_total (status=success)
        Executor->>Metrics: Observe pipeline_execution_duration_seconds
        Executor->>Metrics: Decrement pipeline_active_executions
    else Pipeline Failure
        Executor->>Metrics: Increment pipeline_executions_total (status=failed)
        Executor->>Metrics: Increment pipeline_failures_total
        Executor->>Metrics: Observe pipeline_execution_duration_seconds
        Executor->>Metrics: Decrement pipeline_active_executions
    end
    
    Client->>Prometheus: GET /metrics
    Prometheus->>Metrics: Collect Metrics
    Prometheus-->>Client: Return Prometheus Format

    Grafana->>Prometheus: Query pipeline_executions_total, pipeline_active_executions, etc.
    Prometheus-->>Grafana: Metric Data
    Grafana->>Grafana: Render Dashboard Panels

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Implement Pipeline Metrics Dashboard [7](#header-7) #30: This PR directly implements the Prometheus metrics, executor instrumentation, and /metrics endpoint specified in the issue.
Issue #20: Add Prometheus Metrics Export Integration [7](#header-7) #40: This PR implements the identical Prometheus metrics collection, executor timing instrumentation, and metrics exposure functionality proposed in the issue.

Suggested labels

apertre3.0, hard

Poem

🐰 Hop hop, metrics flow,
Prometheus glows bright and true,
Dashboards now show,
Pipelines dance through,
Execution tales in numbers anew! 📊

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately and concisely describes the main changes: adding Prometheus metrics collection and a Grafana dashboard for monitoring pipeline executions.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

CodeRabbit can use your project's `pylint` configuration to improve the quality of Python code reviews.

Add a pylint configuration file to your project to customize how CodeRabbit runs pylint.

devin-ai-integration

Devin Review found 1 potential issue.

View 4 additional findings in Devin Review.

devin-ai-integration · 2026-03-07T13:21:24Z

backend/core/executor.py

+
+            # Record failure metrics
+            duration = time.time() - start_time
+            pipeline_executions_total.labels(pipeline_id=pipeline.id, status='failed').inc()
+            pipeline_failures_total.labels(pipeline_id=pipeline.id).inc()
+            pipeline_execution_duration_seconds.labels(pipeline_id=pipeline.id).observe(duration)
+            pipeline_active_executions.labels(pipeline_id=pipeline.id).dec()


🔴 pipeline_active_executions gauge never decremented if metrics recording in except block raises

The pipeline_active_executions gauge is incremented at backend/core/executor.py:48 (before the try block), but it is only decremented inside the try success path (line 79) or inside the except block (line 98). If any of the metrics calls preceding the .dec() in the except block (lines 95–97) raise an exception, the decrement on line 98 is never reached and the exception propagates out. This causes the gauge to permanently drift upward, reporting phantom active executions. The .dec() call should be placed in a finally block to guarantee it always executes regardless of what happens during metrics recording.

Prompt for agents

In backend/core/executor.py, refactor the execute() method (lines 47-99) so that the pipeline_active_executions gauge decrement is guaranteed to run. Move `pipeline_active_executions.labels(pipeline_id=pipeline.id).dec()` out of both the try and except blocks and into a finally block. The structure should be: pipeline_active_executions.labels(pipeline_id=pipeline.id).inc() start_time = time.time() try: ... (success path, record success metrics but WITHOUT the .dec() call) except Exception as e: ... (failure path, record failure metrics but WITHOUT the .dec() call) finally: pipeline_active_executions.labels(pipeline_id=pipeline.id).dec() Remove the .dec() calls from both the try block (currently line 79) and the except block (currently line 98).

Was this helpful? React with 👍 or 👎 to provide feedback.

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/core/executor.py`:
- Around line 47-49: Replace uses of the wall-clock timer time.time() with a
monotonic high-resolution timer time.perf_counter() for all elapsed-time
measurements in backend/core/executor.py: change the start_time assignments
(e.g. the start_time set next to
pipeline_active_executions.labels(pipeline_id=pipeline.id).inc()) and any
subsequent elapsed = time.time() - start_time calculations to use
time.perf_counter() instead, and ensure imports/reference use the perf_counter
symbol; apply the same replacement at the other occurrences noted (the other
start_time/elapsed sites around lines 49, 76, 94, 111, 137, 153) so
histogram.observe calls use perf_counter-based durations.

In `@backend/main.py`:
- Around line 75-79: The /metrics endpoint (prometheus_metrics) is currently
exposed anonymously; when settings.AUTH_ENABLED is true, protect it by adding
the same authentication dependency (auth_dependencies) to the app.get decorator
or route registration and hide it from the public OpenAPI (set
include_in_schema=False) so it requires auth and is not in the public schema;
conditionalize the decorator or route registration on settings.AUTH_ENABLED (or
always attach auth_dependencies when enabled) to ensure prometheus_metrics is
only accessible to authenticated/internal callers.

In `@backend/monitoring/metrics.py`:
- Around line 39-44: The metric uses an unbounded label 'stage_name'
(stage_execution_duration_seconds) which will create infinite Prometheus series;
change the metric to use a bounded dimension such as 'stage_type' or
'stage_index' instead and update all call sites that set labels (references in
executor where stage_execution_duration_seconds.labels(...) is called) to
provide the chosen bounded value (e.g., Stage.type or a numeric stage index).
Leave Stage.name unchanged for logs/traces—emit the free-form name to
logging/tracing code but stop passing it into the metric labels. Ensure you
update the metric declaration (remove 'stage_name' label) and all uses in the
executor (the label-setting calls around the stage timing at the locations that
currently pass stage.name) to match the new label keys.

In `@docs/monitoring/grafana_dashboard.json`:
- Around line 92-94: The legend text is incorrect: the PromQL expression using
histogram_quantile(0.95, ...) produces a 95th percentile (p95), not a 95%
confidence interval; update the legendFormat value (currently "{{pipeline_id}}
(95% CI)") to a correct label such as "{{pipeline_id}} (p95)" or
"{{pipeline_id}} (95th percentile)" so the dashboard accurately reflects the
metric returned by histogram_quantile.
- Around line 24-28: The success-rate PromQL expression for the target using the
metric pipeline_executions_total should be changed to handle idle 5m windows:
replace the current expr string with the pattern that performs the division only
when the denominator sum(rate(pipeline_executions_total[5m])) > 0 and otherwise
returns 0 (use the "and on() (... > 0) or on() vector(0)" post-filter approach).
Locate the "targets" -> "expr" entry containing the current
sum(rate(pipeline_executions_total...)) division and update it to the
recommended guarded expression so the panel yields 0% instead of NaN/N/A when no
executions occur.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9ee64cdf-d109-47b9-bae1-2b24a89d032d

📥 Commits

Reviewing files that changed from the base of the PR and between fad909a and e7a42c9.

⛔ Files ignored due to path filters (4)

backend/__pycache__/main.cpython-311.pyc is excluded by !**/*.pyc
backend/api/__pycache__/schemas.cpython-311.pyc is excluded by !**/*.pyc
backend/core/__pycache__/executor.cpython-311.pyc is excluded by !**/*.pyc
backend/db/__pycache__/crud.cpython-311.pyc is excluded by !**/*.pyc

📒 Files selected for processing (4)

backend/core/executor.py
backend/main.py
backend/monitoring/metrics.py
docs/monitoring/grafana_dashboard.json

coderabbitai · 2026-03-07T13:29:09Z

backend/core/executor.py

+        # Track active pipeline start
+        pipeline_active_executions.labels(pipeline_id=pipeline.id).inc()
+        start_time = time.time()


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

In Python, what clock is recommended for measuring elapsed durations: time.time()ortime.perf_counter()/time.monotonic()?

💡 Result:

For measuring elapsed durations, prefer a monotonic clock:

Use time.perf_counter() (or time.perf_counter_ns()) for timing/benchmarking elapsed time, because it’s monotonic and uses the highest available resolution clock. [1], [2]

Use time.monotonic() (or time.monotonic_ns()) when you specifically want a monotonic elapsed-time clock (also immune to system clock changes), typically with slightly less “benchmark-focused” intent than perf_counter(). [1], [3]

Avoid time.time() for durations because it’s “wall clock” time and can jump forwards/backwards if the system time is adjusted (NTP, manual changes), which can break elapsed-time measurements. [1], [3]

Rule of thumb: durations → perf_counter() (most common); timestamps/“seconds since epoch” → time.time(). [1], [3]

Sources:
[1] PEP 418 (monotonic/perf_counter rationale and use) (peps.python.org)
[2] Python time module docs (perf_counter) (docs.python.org)
[3] Python time module docs (clock semantics; monotonic vs realtime) (docs.python.org)

Citations:

1: https://peps.python.org/pep-0418/?utm_source=openai

2: https://docs.python.org/3/library/time.html?utm_source=openai

3: https://docs.python.org/3/library/time.html?utm_source=openai

🏁 Script executed:

find . -name "executor.py" -path "*/core/*" | head -5

Repository: fuzziecoder/Flexi-Roaster

Length of output: 131

🏁 Script executed:

head -160 backend/core/executor.py | cat -n

Repository: fuzziecoder/Flexi-Roaster

Length of output: 7403

🏁 Script executed:

tail -n +160 backend/core/executor.py | grep -n "time\." | head -20

Repository: fuzziecoder/Flexi-Roaster

Length of output: 51

Use a monotonic clock for elapsed-time metrics.

time.time() tracks wall time. NTP or manual clock changes can move it backward or forward and skew these histogram observations. time.perf_counter() is the recommended approach for measuring elapsed durations.

⏱️ Proposed refactor

- start_time = time.time() + start_time = time.perf_counter() ... - duration = time.time() - start_time + duration = time.perf_counter() - start_time ... - duration = time.time() - start_time + duration = time.perf_counter() - start_time ... - stage_start_time = time.time() + stage_start_time = time.perf_counter() ... - stage_duration = time.time() - stage_start_time + stage_duration = time.perf_counter() - stage_start_time ... - stage_duration = time.time() - stage_start_time + stage_duration = time.perf_counter() - stage_start_time

Also applies to: 49, 76, 94, 111, 137, 153

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/core/executor.py` around lines 47 - 49, Replace uses of the wall-clock timer time.time() with a monotonic high-resolution timer time.perf_counter() for all elapsed-time measurements in backend/core/executor.py: change the start_time assignments (e.g. the start_time set next to pipeline_active_executions.labels(pipeline_id=pipeline.id).inc()) and any subsequent elapsed = time.time() - start_time calculations to use time.perf_counter() instead, and ensure imports/reference use the perf_counter symbol; apply the same replacement at the other occurrences noted (the other start_time/elapsed sites around lines 49, 76, 94, 111, 137, 153) so histogram.observe calls use perf_counter-based durations.

coderabbitai · 2026-03-07T13:29:09Z

backend/main.py

+# Prometheus Metrics scraping endpoint
+@app.get("/metrics", tags=["monitoring"])
+async def prometheus_metrics():
+    """Prometheus metrics endpoint"""
+    return Response(generate_latest(), media_type=CONTENT_TYPE_LATEST)


⚠️ Potential issue | 🟠 Major

Protect /metrics; it currently bypasses auth.

When settings.AUTH_ENABLED is true, only the routers on Lines 97-110 get auth_dependencies; this endpoint stays anonymous. That exposes operational data and any user-derived label values to anyone who can reach the service. Gate it behind the same auth policy or an internal-only ingress, and keep it out of the public schema.

🔒 Proposed fix

+# Define this before the route declarations. +metrics_dependencies = [Depends(get_current_auth_context)] if settings.AUTH_ENABLED else [] + -@app.get("/metrics", tags=["monitoring"]) +@app.get("/metrics", tags=["monitoring"], include_in_schema=False, dependencies=metrics_dependencies) async def prometheus_metrics(): """Prometheus metrics endpoint""" return Response(generate_latest(), media_type=CONTENT_TYPE_LATEST)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/main.py` around lines 75 - 79, The /metrics endpoint (prometheus_metrics) is currently exposed anonymously; when settings.AUTH_ENABLED is true, protect it by adding the same authentication dependency (auth_dependencies) to the app.get decorator or route registration and hide it from the public OpenAPI (set include_in_schema=False) so it requires auth and is not in the public schema; conditionalize the decorator or route registration on settings.AUTH_ENABLED (or always attach auth_dependencies when enabled) to ensure prometheus_metrics is only accessible to authenticated/internal callers.

coderabbitai · 2026-03-07T13:29:09Z

backend/monitoring/metrics.py

+stage_execution_duration_seconds = Histogram(
+    'stage_execution_duration_seconds',
+    'Stage execution duration in seconds',
+    ['pipeline_id', 'stage_name', 'status'],
+    buckets=(0.1, 0.5, 1.0, 5.0, 15.0, 30.0, 60.0, 120.0, 300.0, float('inf'))
+)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify that `Stage.name` is unconstrained and flows directly into the metric label. rg -n -C2 'class Stage|name:\s*str|stage_execution_duration_seconds|stage_name\s*=\s*stage\.name' backend

Repository: fuzziecoder/Flexi-Roaster

Length of output: 8075

Avoid unbounded stage_name labels.

Stage.name is defined as a free-form str in backend/models/pipeline.py and passed directly into the stage_execution_duration_seconds metric label at backend/core/executor.py lines 140 and 156. Every distinct stage name creates a new time-series in Prometheus, which will eventually exhaust memory and degrade query latency. Use a bounded dimension (e.g., stage type or index) instead, and keep the free-form name in logs or traces.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/monitoring/metrics.py` around lines 39 - 44, The metric uses an unbounded label 'stage_name' (stage_execution_duration_seconds) which will create infinite Prometheus series; change the metric to use a bounded dimension such as 'stage_type' or 'stage_index' instead and update all call sites that set labels (references in executor where stage_execution_duration_seconds.labels(...) is called) to provide the chosen bounded value (e.g., Stage.type or a numeric stage index). Leave Stage.name unchanged for logs/traces—emit the free-form name to logging/tracing code but stop passing it into the metric labels. Ensure you update the metric declaration (remove 'stage_name' label) and all uses in the executor (the label-setting calls around the stage timing at the locations that currently pass stage.name) to match the new label keys.

coderabbitai · 2026-03-07T13:29:09Z

docs/monitoring/grafana_dashboard.json

+            "targets": [
+                {
+                    "expr": "sum(rate(pipeline_executions_total{status=\"success\"}[5m])) / sum(rate(pipeline_executions_total[5m])) * 100",
+                    "refId": "A"
+                }


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

In PromQL, how should a success / total * 100 query be written so Grafana shows 0 instead of NaN/N/A when the denominator is zero or the series is absent?

💡 Result:

Use a filter to drop the divide-by-zero series, then fallback to 0 when the result is empty:

( 100 * success / total ) and on(<your_grouping_labels>) ( total > 0 ) or on() vector(0)

total > 0 (without bool) removes series where the denominator is 0, so you don’t get NaN.

... or on() vector(0) makes Grafana get a 0 when the whole expression returns no series (missing/absent data). [1]

If you’re aggregating (common case), apply the same idea to the aggregated vectors:

( 100 * sum by (job) (success) / sum by (job) (total) ) and on(job) ( sum by (job) (total) > 0 ) or on() vector(0)

Sources: [1] https://grafana.com/docs/grafana/latest/alerting/guides/missing-data/ (Prometheus example: your_metric_query OR on() vector(0)), [2] https://stackoverflow.com/questions/47056557/how-to-gracefully-avoid-divide-by-zero-in-prometheus

🏁 Script executed:

find . -name "grafana_dashboard.json" -type f

Repository: fuzziecoder/Flexi-Roaster

Length of output: 109

🏁 Script executed:

# Once we find the file, let's examine it cat -n docs/monitoring/grafana_dashboard.json | head -100

Repository: fuzziecoder/Flexi-Roaster

Length of output: 3701

Handle idle windows in the success-rate query.

When no executions occur in a 5-minute window, both sum(rate(...)) expressions return empty series. The denominator becomes 0 or absent, causing the division to produce NaN/N/A instead of 0%.

The proposed fix using or vector(0) and clamp_min() will work. However, PromQL documentation recommends a cleaner pattern that filters before calculating:

📊 Alternative approach

( 100 * sum(rate(pipeline_executions_total{status="success"}[5m])) / sum(rate(pipeline_executions_total[5m])) ) and on() (sum(rate(pipeline_executions_total[5m])) > 0) or on() vector(0)

This filters out the division when the denominator is 0, then falls back to 0 when the entire expression returns no series.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/monitoring/grafana_dashboard.json` around lines 24 - 28, The success-rate PromQL expression for the target using the metric pipeline_executions_total should be changed to handle idle 5m windows: replace the current expr string with the pattern that performs the division only when the denominator sum(rate(pipeline_executions_total[5m])) > 0 and otherwise returns 0 (use the "and on() (... > 0) or on() vector(0)" post-filter approach). Locate the "targets" -> "expr" entry containing the current sum(rate(pipeline_executions_total...)) division and update it to the recommended guarded expression so the panel yields 0% instead of NaN/N/A when no executions occur.

coderabbitai · 2026-03-07T13:29:09Z

docs/monitoring/grafana_dashboard.json

+                    "expr": "histogram_quantile(0.95, sum(rate(pipeline_execution_duration_seconds_bucket[5m])) by (le, pipeline_id))",
+                    "legendFormat": "{{pipeline_id}} (95% CI)",
+                    "refId": "A"


⚠️ Potential issue | 🟡 Minor

Rename the legend; this is a percentile, not a confidence interval.

histogram_quantile(0.95, ...) returns a p95 estimate. Labeling it 95% CI is statistically incorrect and will mislead dashboard readers.

📝 Proposed fix

- "legendFormat": "{{pipeline_id}} (95% CI)", + "legendFormat": "{{pipeline_id}} p95",

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/monitoring/grafana_dashboard.json` around lines 92 - 94, The legend text is incorrect: the PromQL expression using histogram_quantile(0.95, ...) produces a 95th percentile (p95), not a 95% confidence interval; update the legendFormat value (currently "{{pipeline_id}} (95% CI)") to a correct label such as "{{pipeline_id}} (p95)" or "{{pipeline_id}} (95th percentile)" so the dashboard accurately reflects the metric returned by histogram_quantile.

devin-ai-integration

Devin Review found 1 new potential issue.

View 5 additional findings in Devin Review.

devin-ai-integration · 2026-03-13T03:28:22Z

backend/core/executor.py

+            # Record success metrics
+            duration = time.time() - start_time
+            pipeline_executions_total.labels(pipeline_id=pipeline.id, status='success').inc()
+            pipeline_execution_duration_seconds.labels(pipeline_id=pipeline.id).observe(duration)
+            pipeline_active_executions.labels(pipeline_id=pipeline.id).dec()


🟡 Success-path metrics recording inside try block can overwrite COMPLETED status to FAILED

The Prometheus metrics recording for the success path (lines 76-79) is placed inside the try block, after the execution status is already set to ExecutionStatus.COMPLETED at line 67. If any of the metrics calls (pipeline_executions_total.labels(...).inc(), .observe(), or .dec()) raise an exception, the except Exception block at line 81 will catch it and overwrite execution.status to ExecutionStatus.FAILED at line 83 — even though all pipeline stages completed successfully. The returned Execution object would report a failed status, a misleading error message from the Prometheus client, and failure metrics would be emitted to Prometheus instead of success metrics. The metrics recording should be moved after the try/except, or into a separate try block, so it cannot interfere with the execution status determination.

Prompt for agents

In backend/core/executor.py, the metrics recording code for both success and failure paths (lines 75-79 and 93-98) should be restructured so that a failure in metrics recording cannot change the execution status. The recommended approach is to: 1) Move the pipeline_active_executions.dec() call into a finally block to guarantee gauge cleanup regardless of what happens. 2) Move the success metrics recording (lines 76-79) out of the try block and into a separate try/except or into a finally block with a status check. For example, after the main try/except, add a finally block that always decrements the gauge, and use a separate try/except for the Prometheus calls that cannot affect execution.status. The key invariant is: once execution.status is set to COMPLETED at line 67, no subsequent code path should overwrite it to FAILED unless the pipeline stages themselves failed.

Was this helpful? React with 👍 or 👎 to provide feedback.

feat(monitoring): add prometheus metrics collection and grafana dashb…

e7a42c9

…oard

devin-ai-integration bot reviewed Mar 7, 2026

View reviewed changes

coderabbitai bot reviewed Mar 7, 2026

View reviewed changes

improvement-1

f863db3

devin-ai-integration bot reviewed Mar 13, 2026

View reviewed changes

Uh oh!

Conversation

KalpanaBhaskar commented Mar 7, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What:

Why:

How:

Testing:

Summary by CodeRabbit

Release Notes

Uh oh!

vercel bot commented Mar 7, 2026

Uh oh!

coderabbitai bot commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related issues

Suggested labels

Poem

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

KalpanaBhaskar commented Mar 7, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 7, 2026 •

edited

Loading