feat(oracle): pin follow-up tasks to the owning worker (session→worker affinity)#979
Conversation
…r affinity) Closes #978. A follow-up task could be dispatched to a worker on a different ChatGPT account than the one that owns the conversation, making multi-turn unusable on any shared / multi-account pool. Pin follow-ups to the worker/account that created the session; keep fresh tasks competitive. - Add `owner_worker_label` to OracleSession and `required_worker_label` to OracleTask (both serde-optional → legacy docs stay unpinned). - Stamp `owner_worker_label` on the first successful result (worker_submit_result) and on imported transcripts (attach), using a `{ owner_worker_label: null }` filter so the first account to answer keeps ownership. - submit_task copies the session owner onto follow-ups as `required_worker_label`. - claim_task filters queued tasks by `$or: [ {required_worker_label: null}, {required_worker_label: <worker>} ]` so fresh tasks go to any worker and follow-ups only to the owner. - Add a `{ pool_id, status, required_worker_label, created_at }` index. Workers and the CDP/userscript clients need no change.
📊 Code coverage
Gate: line coverage must stay at or above the threshold. Ratchet plan (W21): Backend → 55%, CLI → 50%, Frontend → 30% by quarter end. |
Review: solid, well-scoped fix — okay to merge, with two things worth surfacing firstPulled the branch and read the full What's right
Two things to flag before/with merge1. Pinned follow-ups whose owning worker never returns become permanent zombies that consume quota. Chain of facts:
So if account A's worker disappears or reconnects under a different label, A's follow-ups sit queued forever, silently eating the submitter's inflight budget and pool queue capacity with no automatic recovery. The issue explicitly defers "lease/age fallback later," so it's known — suggest tracking it as a follow-up (an age-based unpin that clears 2. Affinity keys on a client-chosen
Neither is a regression, but since the fix lives or dies by that invariant, it'd be worth a line in the code/docs. Minor (non-blocking)
|
…iant Addresses review feedback on the session→worker affinity PR. - Add release_stale_affinity(): a follow-up pinned to an owning account whose worker never returns is unpinned once it has waited a full task_timeout_secs window, so it stops leaking the submitter's inflight quota and the pool queue cap forever (the "lease/age fallback" the issue deferred). Swept in claim_task, which every live worker polls. - Document the worker_label⇔account invariant that affinity rests on, on OracleSession::owner_worker_label. - Tests: stale_followup_affinity_is_released_after_grace (pinned → unclaimable before grace, claimable by any worker after) and failed_first_turn_leaves_session_unowned (failed turn 1 counts the turn and pins the URL but does not stamp ownership, so the next follow-up stays unpinned).
|
Thanks for the thorough read — addressed all three in 1. Zombie pinned follow-ups consuming quota — implemented the issue's deferred lease/age fallback as 2. Affinity keys on a client-chosen 3. Minor — failed turn-1 with a URL but no owner — added The |
|
Re-reviewed
One residual assumption, not a blocker — flagging for the record: when a released follow-up is claimed by a non-owner, LGTM. |
Closes #978.
Problem
oracle_task_service::claim_taskclaimed the oldest queued task filtered only by pool + status, with no notion of which account/worker a conversation belongs to. On a multi-account shared pool, a follow-up for account A's conversation could be claimed by account B's worker, which cannot open A's/c/<id>→ wrong/blank/failed answer. This made multi-turn unusable on any shared pool. A fixedconversation_idpins which conversation, not which account, so it can't be fixed client-side.Solution
Pin follow-ups to the worker/account that created the session; keep fresh tasks competitive.
owner_worker_label: Option<String>onOracleSession;required_worker_label: Option<String>onOracleTask. Bothserde(default, skip_serializing_if = None), so legacy docs stay unpinned and the fields never surface in consumer/worker payloads.worker_submit_resultsetsowner_worker_labelon the first successful result via a{ owner_worker_label: null }filter (idempotent → the first account to answer keeps ownership). Same stamp added to theattach/transcript-import path, since that account physically owns the scraped/c/<id>.submit_taskcopies the session's owner onto follow-ups asrequired_worker_label.claim_taskqueued filter now ANDs$or: [ {required_worker_label: null}, {required_worker_label: <worker>} ]: fresh (null/absent) → any worker; pinned follow-up → owning worker only.{ pool_id, status, required_worker_label, created_at }onoracle_tasks.Workers and the CDP/userscript clients need no change —
required_worker_labelrides along in the existing task and the client already obeysconversation_url.Backward compatibility
owner_worker_labelstay unpinned (today's behavior).Tests
Added
followup_pins_to_owning_worker: owner stamping on first result, follow-up pinning, a non-owner worker idling rather than misrouting, and fresh tasks still claimable by any worker. Updated affected struct-literal fixtures.