Let agent signal step failure via a tagged block by zentron · Pull Request #2045 · OctopusDeploy/Calamari

zentron · 2026-06-26T07:49:52Z

Background

The AI agent step runs the Claude CLI, which always exits 0 — even when the agent semantically failed at the task. That left a gap: when a workflow author writes "fail the deployment if the smoke test doesn't pass", the agent could detect the condition but had no way to make Octopus mark the step as failed. ClaudeAgentOutcomeEvaluator only inspected the process exit code and the CLI's structured result (is_error, subtype, permission denials), none of which capture an intentional failure.

Results

Adds a deterministic agent→managed-code failure signal:

New octopus-fail-deployment skill — instructs the agent, when the user expressed a failure condition that's been met, to emit an <octopus-task-failed>…</octopus-task-failed> block with an operator-facing reason. Absence of the block = success (unchanged default).
ClaudeAgentOutcomeEvaluator — scans the final result text for the block and throws a CommandException with the captured reason (generic fallback when empty), checked before the generic CLI-status checks so an intentional failure surfaces a clear message.

Design notes:

Uses a stdout tag rather than a marker file, so it works even when the agent is sandboxed without write permissions.
The required closing tag doubles as a completeness guard — a truncated, unclosed block won't match, so it can't be mistaken for a deliberate failure.
Tag wording is task-neutral (not deployment-specific) so it reads correctly for runbooks too.

No Server change required — the failure propagates through the existing non-zero-exit path.

Testing

ClaudeAgentOutcomeEvaluatorFixture — 14 unit tests passing, including reason capture, multi-line reasons, empty/self-closing blocks, a block embedded in larger output, precedence over a non-success subtype, and that an unclosed (truncated) block does not fail the step.

How to review

Core logic is the regex + check in ClaudeAgentOutcomeEvaluator.cs; the skill markdown is the agent-facing contract. The rest is test coverage.

🤖 Generated with Claude Code

Resolves: #MD-2151

A Claude CLI run always exits 0, so a user-requested failure condition ("fail the deployment if the health check is red") was undetectable from the outside. Add an octopus-fail-deployment skill that has the agent emit an <octopus-task-failed> block, and have ClaudeAgentOutcomeEvaluator scan the result for it and fail the step with the captured reason. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

zentron · 2026-06-28T23:05:52Z

            catch (Exception installException)
            {
-                Console.Error.WriteLine("Running rollback behaviours...");
+                log.Verbose("Running rollback behaviours...");


This was annoying as hell. There is no rollback behaviour taking place, so why make it so prominent.

Copilot

Pull request overview

This PR adds a deterministic mechanism for the AI agent to intentionally fail an Octopus step (despite the Claude CLI exiting 0) by emitting a tagged <octopus-task-failed>...</octopus-task-failed> block, which is then detected and converted into a managed-code failure.

Changes:

Added a new agent skill (octopus-fail-deployment) that defines the contract for signalling an intentional failure via a tagged block.
Updated ClaudeAgentOutcomeEvaluator to scan agent output for the failure tag and throw a CommandException with the captured reason.
Extended ClaudeAgentOutcomeEvaluator unit tests to cover the new failure signal behavior.
Switched rollback messaging in PipelineCommand from direct Console.Error writes to ILog.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
source/Calamari.Common/Plumbing/Pipeline/PipelineCommand.cs	Routes rollback diagnostics through `ILog` instead of writing directly to stderr.
source/Calamari.AiAgent/ClaudeCodeBehaviour/DefaultContext/Skills/octopus-fail-deployment.md	Defines the agent-facing failure-signal contract and formatting rules.
source/Calamari.AiAgent/ClaudeCodeBehaviour/ClaudeAgentOutcomeEvaluator.cs	Detects the `<octopus-task-failed>` block and fails the step with a clear reason.
source/Calamari.AiAgent.Tests/ClaudeCodeBehaviour/ClaudeAgentOutcomeEvaluatorFixture.cs	Adds unit coverage for the new intentional-failure signal parsing/precedence.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

zentron · 2026-06-29T00:23:39Z

                }
                catch (Exception rollbackException)
                {
-                    Console.Error.WriteLine(rollbackException);


We should generally not be writing to Console in Calamari as it makes testing more difficult

Align the octopus-fail-deployment skill spec and code comment with the matcher, which accepts a self-closing <octopus-task-failed/> as a reason-less failure. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

eddymoulton

Couple of nits, but looks good

eddymoulton · 2026-06-30T01:41:52Z

+## Rules
+
+- Emit the block **only** when the user expressed a failure condition AND you have determined it is met. If the condition was not met, say nothing special and let the step succeed.
+- Always write a **complete** block — either a paired block ending in `</octopus-task-failed>` or a self-closing `<octopus-task-failed/>`. A closed tag is how Octopus confirms the message is whole — if you open the block but stop before closing it, the failure will not be detected, so finish the block before ending your turn.


RE: self closing block

I would be tempted to remove that as an option. Makes this whole thing more consistent and the matching regex simpler.

I'm a bit concerned about removing flexibility that Claude might like to take advantage of however.
I'll leave it with you to decide if you think cutting that down makes things simpler enough to warrant the change.

Interesting you say that. I originally had it not allow closing tags but then Claude pointed out it might make it more likely to use it when there happens to be no specific reason to apply.
Ill leave it in for now, but if it creates any false positives/negatives we can reconsider.

Claude pointed out it might make it more likely to use it when there happens to be no specific reason to apply

That's reason enough if Claude thinks it might be a problem without it.

…Evaluator.cs Co-authored-by: Eddy Moulton <8491021+eddymoulton@users.noreply.github.com>

zentron and others added 2 commits June 26, 2026 17:33

Fix confusing error

863e7cf

zentron commented Jun 28, 2026

View reviewed changes

Remove comment

e9f1247

zentron requested a review from Copilot June 29, 2026 00:03

Copilot started reviewing on behalf of zentron June 29, 2026 00:03 View session

Copilot AI reviewed Jun 29, 2026

View reviewed changes

Better error formatting for logs

9c37036

zentron commented Jun 29, 2026

View reviewed changes

Document self-closing failure signal tag

44f53ff

Align the octopus-fail-deployment skill spec and code comment with the matcher, which accepts a self-closing <octopus-task-failed/> as a reason-less failure. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

eddymoulton approved these changes Jun 30, 2026

View reviewed changes

Update source/Calamari.AiAgent/ClaudeCodeBehaviour/ClaudeAgentOutcome…

a3d4075

…Evaluator.cs Co-authored-by: Eddy Moulton <8491021+eddymoulton@users.noreply.github.com>

zentron enabled auto-merge (squash) June 30, 2026 05:46

zentron merged commit 1bf6fcf into main Jun 30, 2026
35 checks passed

zentron deleted the robert/agent-fail-deployment-signal branch June 30, 2026 06:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Let agent signal step failure via a tagged block#2045

Let agent signal step failure via a tagged block#2045
zentron merged 6 commits into
mainfrom
robert/agent-fail-deployment-signal

zentron commented Jun 26, 2026 •

edited

Loading

Uh oh!

zentron Jun 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zentron Jun 29, 2026

Uh oh!

eddymoulton left a comment

Uh oh!

Uh oh!

eddymoulton Jun 30, 2026

Uh oh!

zentron Jun 30, 2026

Uh oh!

eddymoulton Jun 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

zentron commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Results

Testing

How to review

Uh oh!

zentron Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zentron Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

eddymoulton left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

eddymoulton Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

zentron Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

eddymoulton Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zentron commented Jun 26, 2026 •

edited

Loading