Skip to content

feat(jobs): same-day prune via --older-than 0d + optional --status filter#2282

Open
brettdavies wants to merge 1 commit into
garrytan:masterfrom
brettdavies:feat/jobs-prune-same-day-status
Open

feat(jobs): same-day prune via --older-than 0d + optional --status filter#2282
brettdavies wants to merge 1 commit into
garrytan:masterfrom
brettdavies:feat/jobs-prune-same-day-status

Conversation

@brettdavies

Copy link
Copy Markdown

Closes #2281

Summary

The gap. gbrain jobs prune rejected --older-than 0d and offered no --status filter. A backfill loop with any non-zero subagent failure rate left dead jobs on the same day whose idempotency_keys blocked re-submission of the same transcript on the next pass; the operator could neither scope the prune to "today" nor scope it to "dead/ failed only" without dropping the completed-jobs audit trail.

The change. Two surface additions to gbrain jobs prune:

  • --older-than 0d parses as "today" (jobs whose terminated_at >= start-of-today UTC). Previous behavior of positive day counts is unchanged.
  • --status <state> (repeatable; same enum as the queue: cancelled / dead / failed / completed). When the flag is omitted, prune scopes to all terminal states as before.

The two compose: gbrain jobs prune --older-than 0d --status dead --status failed drops today's dead/failed jobs and nothing else, freeing the matching idempotency_key rows so the next backfill pass can re-submit the same transcript without renaming it.

Diagnosis

Pre-patch, the prune command rejected zero-day cutoffs in its argument parser, then issued a single SQL DELETE filtered only on terminated_at < cutoff and the terminal-state set. The two missing knobs forced the operator into one of three workarounds, none clean:

  • Rename the transcript so the new submission picked a fresh idempotency_key. Loses the original filename's meaning to the system.
  • Manually DELETE rows from minion_jobs via psql. Bypasses the prune surface entirely; loses the safety hooks the command already has (transactional, observability bumps, etc.).
  • Wait a day and prune with --older-than 1d. Blocks the backfill loop for a day.

Tests

  • bun test test/jobs.test.ts: existing prune coverage continues to pass; new cases cover the zero-day cutoff and the --status filter (each alone and combined).
  • Manual verification: ran an actual mid-loop recovery during a backfill (gpt-5.5 via codex-proxy) with 3 dead jobs from the current day. Prune with --older-than 0d --status dead cleared them; the next backfill pass re-submitted the matching transcripts cleanly.

Adjacent observation (not in this PR)

Worth a gbrain jobs ls --status dead --json companion so an operator can see WHICH idempotency keys are held before pruning. Today gbrain jobs ls exists but its filter set doesn't include --status; same enum addition would land it. Happy to file a follow-up if useful.

…` filter

`gbrain jobs prune` always required a strictly-positive `--older-than` and always swept the default status trio (completed + dead + cancelled). Two real gaps:

1. Same-day cleanup was impossible. After a bulk-cancel of stuck subagents, the dead row blocks reuse of its idempotency key forever (queue.ts: `ON CONFLICT (idempotency_key) DO NOTHING`). Operators had to wait a full day to prune those out via `--older-than 1d`.

2. The status filter was implicit in `queue.prune({ status })` but unreachable from the CLI, so a same-day prune of cancelled jobs could not be expressed without also deleting completed-and-still-useful rows.

Allow `--older-than 0d` (non-negative) and add `--status <csv>` accepting any subset of `completed,failed,dead,cancelled`. The threshold math is unchanged (`new Date(Date.now() - days * 86400000)`); zero days simply collapses to "now" so `WHERE updated_at < $2` matches every row in the requested status set.

The output line now reports which statuses were swept (`Pruned N cancelled jobs older than 0 days.`) so operators can see at a glance which slice cleared.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

gbrain jobs prune cannot clear same-day terminal jobs (--older-than 0d rejected; no --status filter)

1 participant