feat(inbox): retain_until_acked + auto_ack_on_pull delivery model#23
Conversation
Agent registration now accepts auto_ack_on_pull (default: false). When true, the hub immediately acks messages on pull so the agent does not need to call the ack endpoint. Opt-in -- no behaviour change for existing agents. Message send now accepts retain_until_acked (default: false). When true, the hub ignores the recipient's auto_ack_on_pull preference and requires explicit ack. Work orders and fix requests always set this automatically. Safety: in pull(), if the auto-ack write throws, the message stays leased and requeues after the visibility timeout rather than being silently consumed. The auto_acked flag is included in the pull response so callers know not to ack again. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PR Review:
|
| # | Severity | Description |
|---|---|---|
| 1 | Bug | auto_acked not forwarded in pull route; lease_until is stale for auto-acked messages |
| 2 | Bug | Service returns status: 'leased' snapshot when storage already has status: 'acked' |
| 3 | Medium | RETAIN_TYPES Set re-allocated on every send() call |
| 4 | Medium | Unconditional extra storage read on every pull() |
| 5 | Medium | No automated tests for the two new delivery paths |
| 6 | Medium | docs/AGENT-GUIDE.md / README.md not updated per CLAUDE.md |
| 7 | Low | No type validation on auto_ack_on_pull input |
The safety design (retain-on-ack-failure > silent loss) is correct and the retain_until_acked override logic is clean. Items 1–2 are functional bugs that should block merge; 3–6 should be addressed given the project's stated standards.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0d23c7f8cf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| await storage.updateMessage(message.id, { | ||
| status: 'acked', | ||
| acked_at: Date.now() | ||
| }); |
There was a problem hiding this comment.
Preserve ephemeral purging when auto-acking pulled messages
When auto_ack_on_pull is enabled, this path writes status: 'acked' directly and bypasses ack(), so ephemeral: true messages no longer have their body purged on ack. In that scenario, sensitive payloads remain stored in envelope.body (potentially indefinitely if no TTL is set), which breaks the documented ephemeral guarantee and changes security behavior for opted-in agents.
Useful? React with 👍 / 👎.
Greptile SummaryAdds two-tier delivery model:
Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| src/routes/inbox.js | added retain_until_acked parameter to send endpoint; missing auto_acked field in pull response |
| src/services/inbox.service.js | implemented auto-ack logic with proper override for work_order/fix_request types — failure-safe design |
Last reviewed commit: 0d23c7f
Additional Comments (1)
|
Bug fixes: - auto_acked flag now forwarded in pull route response; lease_until set to null for auto-acked messages so callers know not to ack - pull() now skips the lease entirely for auto_ack agents and returns status:'acked' with auto_acked:true (previously returned stale status:'leased' snapshot while storage already held 'acked') Medium fixes: - RETAIN_TYPES moved to module-level constant (was re-allocated on every send() call) - Agent lookup moved before the lease in pull() so a failed lookup cannot leave the message in a leased state; auto_ack agents now skip the lease step entirely (cleaner state, one fewer write) - 4 automated tests added covering auto_ack_on_pull, retain_until_acked override, work_order RETAIN_TYPE auto-promotion, and boolean validation - docs/AGENT-GUIDE.md updated with retain_until_acked + auto_ack_on_pull usage guide per CLAUDE.md documentation requirement Low fix: - auto_ack_on_pull validated as boolean at registration (returns 400 INVALID_INPUT on non-boolean value) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PR Review:
|
Bug fix: - retain_until_acked now validated as boolean in send route (returns 400 INVALID_INPUT on non-boolean value, matching auto_ack_on_pull) Performance: - pull() route passes auto_ack_on_pull from req.agent when the HTTP Signature middleware has already populated it, eliminating the redundant storage round-trip on the authenticated hot path. Falls back to a service-level lookup for legacy auth / test paths Minor: - Removed redundant !! coercion in agent.service.js (route validates the type before calling the service) - Added fix_request RETAIN_TYPES test (previously only work_order was covered; both types are in RETAIN_TYPES) - Added test for retain_until_acked non-boolean input rejection Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PR Review:
|
| Claim | Reality |
|---|---|
| "auto_acked not forwarded in pull route" | Route correctly spreads ...(message.auto_acked && { auto_acked: true }) and sets lease_until: null for auto-acked messages |
| "RETAIN_TYPES allocated on every send() call" | RETAIN_TYPES is at module level, matching the SAFE_CHARS pattern |
| "No input validation on retain_until_acked" | Explicit boolean check is present in src/routes/inbox.js before the service call |
| "No automated tests for new paths" | Six new test cases cover auto-ack, retain override, work_order, fix_request, and invalid input rejection |
| "Missing docs update" | docs/AGENT-GUIDE.md is updated with a clear explanation and decision table |
| "Missing fix_request test" | test('fix_request type always sets retain_until_acked server-side', ...) is in the diff |
Real issues
1. PR description inaccuracy - "stays leased" is wrong
In the auto-ack path, the message is never leased - we skip the lease step entirely. If the updateMessage write throws, the message stays queued (its original state), not leased. The safety property still holds (no silent loss), but the description is misleading and could confuse future maintainers who rely on it to understand the failure mode.
2. Redundant || false guards
Two places apply a || false fallback to values already guaranteed to be booleans or to have boolean defaults:
In agent.service.js: auto_ack_on_pull already defaults to false in the param list, so auto_ack_on_pull || false is redundant.
In inbox.js route: retain_until_acked is validated as boolean-or-undefined just above, so retain_until_acked || false is redundant.
These are not bugs (the coercion produces the right result), but they signal uncertainty about types that was just resolved by explicit validation a few lines earlier.
3. Fallback storage lookup on every pull without HTTP Signature auth
The route only populates req.agent when HTTP Signature auth succeeds. In tests (which do not send a Signature header, as the new test cases demonstrate) and for legacy auth paths, every pull() call incurs an extra storage read even though the result is false for the vast majority of agents.
For high-throughput inboxes this is two sequential storage reads per pull instead of one. Options (not necessarily blocking):
- Thread the preference through from middleware to route to service so the service never needs to look it up.
- At minimum, document the expected performance cost with a comment so a future reader does not assume the fallback is rare.
4. No atomicity guard on the auto-ack write
The auto-ack path reads queued messages, picks the oldest, then writes status: 'acked' unconditionally. Two concurrent pulls will both see the same message before either acks it. The standard lease path has the same potential race, so this does not introduce a new problem - but if the storage backend ever gains conditional-update support, the auto-ack path should use it.
What is done well
- Safety invariant is correct:
retain_until_ackedalways wins;RETAIN_TYPESauto-promotion happens server-side so callers cannot accidentally skip it. - Pull route response is correct:
auto_acked: trueonly appears when applicable, andlease_untilisnullfor auto-acked messages - clients cannot misread state. RETAIN_TYPESmodule-level constant is placed correctly and commented well.- Test coverage is thorough: happy paths, override behavior, server-side promotion, and invalid input rejection.
AGENT-GUIDE.mdupdate is clear and includes a useful decision table.
Summary
| # | Severity | Item |
|---|---|---|
| 1 | Low | PR description says "stays leased" - should say "stays queued" |
| 2 | Low | Redundant ` |
| 3 | Medium | Fallback getAgent storage read on every pull without HTTP Signature auth |
| 4 | Low | Auto-ack write has no conditional status guard (same race as standard pull) |
None of these are blockers. Items 1-3 are worth cleaning up before merge; item 1 in particular should be corrected to avoid misleading future readers of the failure-safe rationale.
- Remove redundant `|| false` guards on already-validated booleans (auto_ack_on_pull has param default; retain_until_acked is validated as boolean-or-undefined just above the send() call) - Expand inline comment on pull() fallback storage lookup to document the performance trade-off and future improvement path so the code is not incorrectly removed assuming the fallback is rare or free PR description updated: "stays leased" -> "stays queued" (the auto-ack path skips the lease step entirely; on ack write failure the message stays in its original 'queued' state, not 'leased') Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PR Review:
|
Summary
retain_until_acked— new per-message flag (sender-set). Whentrue, the hub requires explicitPOST /ackbefore removing the message, regardless of the recipient'sauto_ack_on_pullpreference. Work orders and fix requests always set this automatically server-side (by message type), so important work is never silently consumed.auto_ack_on_pull— new per-agent flag (set at registration, defaultfalse). Whentrue, the hub immediately acks the message on pull — fire-and-forget delivery for agents that process messages without needing retry guarantees. Existing agents are unaffected (opt-in only).Safety: the auto-ack path in
pull()is failure-safe. If the ack storage write throws, the message stays leased and requeues after the visibility timeout — a failed ack is always safer than a silently consumed message.Files changed
src/routes/agents.jsauto_ack_on_pullin registration, include in responsesrc/services/agent.service.jsauto_ack_on_pull: falseon agent recordsrc/routes/inbox.jsretain_until_ackedin send bodysrc/services/inbox.service.jsretain_until_ackedinsend(); auto-set forwork_order/fix_request; honorauto_ack_on_pullinpull()with failure-safe ackTest plan
auto_ack_on_pull: true, send a notification, pull — verifyauto_acked: truein responsework_order, pull withauto_ack_on_pullagent — verify hub still requires explicit ack (retain_until_acked overrides auto-ack)Related
Completes the explicit ack model work order from decisive.gm. Previous commits:
a74282eremoved auto-ack from pull loop, cmdAck calls hubf30514cvisibility timeout race fix (1h default + _acked/ guard)🤖 Generated with Claude Code