Skip to content

Delegation dashboard, logging & worker-window workflow (v1.21.0)#6

Open
rdyplayerB wants to merge 44 commits into
mainfrom
feat/delegation-dashboard
Open

Delegation dashboard, logging & worker-window workflow (v1.21.0)#6
rdyplayerB wants to merge 44 commits into
mainfrom
feat/delegation-dashboard

Conversation

@rdyplayerB

Copy link
Copy Markdown
Owner

Summary

A dedicated, full-screen Delegation dashboard (moved out of Settings) plus the logging substrate behind it, so delegating bulk work to custom LLMs (e.g. Qwen via ccr) is observable, accurate, and exportable for analysis.

What's included

Logging & attribution

  • qcdelegate (and the personal qwen wrapper) emit one structured event per run to ~/.quadclaude/events.jsonl: project, originating pane, route, exit, git-snapshot lines/files (throwaway index — never touches your git staging; shadow repo for non-git projects), ground-truth QC_CHECK result, cold-start retries, and prompt/output previews.
  • App-owned log lifecycle: per-project rollup, size-based rotation into a cumulative summary, 90-day retention prune.
  • Cold-start warm-up/retry for freshly-loaded local models.

Dashboard (v1.21.0)

  • Opened from a title-bar chart icon: KPIs, per-project cards (click to filter), expandable per-call timeline with prompt + worker output.
  • Export: Copy log / Export to file — a self-contained markdown + JSON report built to paste back for delegation-quality analysis.

Worker-window workflow

  • Session-scoped: first delegation of a Claude session prompts once to open a live worker window; reliable feed (retries until the pane mounts — fixes the silent empty-window bug).
  • Pairing now resets on every app relaunch (per-session, like Claude sessions).

Testing

  • Verified against 3+ real qwen delegations with varied outcomes; all logged fields checked against ground truth (file attribution, check pass/fail, prompt/output capture, pane tagging).
  • Dashboard rendering, expand, and export round-trip driven & screenshotted via Electron CDP.
  • Pairing-reset and clipboard export verified end-to-end.

🤖 Generated with Claude Code

rdyplayerB and others added 30 commits June 14, 2026 10:24
…21.0)

Adds a dedicated, full-screen Delegation dashboard (out of Settings) plus the
logging substrate behind it, so delegation to custom LLMs is observable and the
log is exportable for analysis.

Logging & attribution
- qcdelegate emits one structured event per run to ~/.quadclaude/events.jsonl:
  project, originating pane (QC_PANE), route, exit, git-snapshot insertions/files
  (throwaway index — never touches the user's staging; shadow repo for non-git
  projects), ground-truth QC_CHECK result, cold-start retries, and prompt/output
  previews.
- App-owned log lifecycle (delegationLog.ts): per-project rollup, size-based
  rotation into a cumulative summary, 90-day retention prune.
- Cold-start warm-up/retry so a freshly-loaded local model's "may not exist"
  error is retried instead of failing.

Dashboard (v1.21.0)
- Full-screen dashboard opened from a title-bar button: KPIs, per-project cards
  (click to filter), expandable per-call timeline with prompt + worker-output.
- Export: Copy log (main-process clipboard) / Export to file — a self-contained
  markdown + JSON report built to paste back for delegation-quality analysis.

Worker-window workflow
- Session-scoped: first delegation of a Claude session prompts once to open a
  live worker window; reused/auto-allocated pane; reliable feed (retries until
  the pane's terminal mounts — fixes the silent empty-window bug).
- Pairing now resets on every app relaunch (per-session, like Claude sessions).

Per-pane QC_PANE env injected at PTY creation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… tooltips, content-sized card

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ch resolves

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r analysis

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…pings

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s lines + dashboard decisions

- qcdecide: orchestrator records each group's KEEP/DELEGATE decision; prints to the
  orchestrator pane, echoes to the worker feed, and logs a decision event.
- qcdelegate/qwen: completion status line (exit/check/lines/time) into the feed so the
  worker window shows clear INPUT -> OUTPUT -> RESULT.
- Dashboard: Decision ledger (kept vs delegated, per group, with reason + check).
- RouterManager generates qcdecide alongside qcdelegate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…/delegation-active + QC_DELEGATION env

The app's delegation toggle was invisible to the Claude running in a pane, so it fell back
to OFF-by-default. Main now writes an authoritative delegation-active status file (on
launch + on every workspace save) and injects QC_DELEGATION / QC_DELEGATION_MODEL into each
pane's shell, so a fresh session can auto-detect delegation mode.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- useDelegation: toggle-driven auto worker feed (replaces fragile per-event prompt that
  could miss events predating app launch) — when delegation is ON, one worker pane always
  tails the live feed; torn down when OFF.
- Settings → General: discoverable Delegation on/off toggle (was only an ambiguous pill).
- qcdoctor: one-shot health check of the whole delegation pipeline (toggle, model, ccr,
  tools, hook, env, recent activity, worker pairing); generated by the app + on PATH.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rd is the live surface

The toggle-driven auto-feed commandeered a work pane as the feed and mis-assigned
orchestrator/worker on launch. Delegation activity surfaces in the Delegation dashboard
(live decision ledger + timeline); pane pairing is user-initiated only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…+ fail-fast VPN preflight

Delegation reliability work driven by reproducing the "inconsistent" symptom on a
Diablo-flavored benchmark (same free qwen on both sides).

Engine swap (qcdelegate):
- New QC_ENGINE switch. Default `aider`: runs `aider --architect --auto-test` straight
  against the model's OpenAI-compatible endpoint (no ccr, no mergesystem transformer —
  aider uses SEARCH/REPLACE diff edits, which a 30B model handles far more reliably than
  Claude Code's tool-call protocol). With QC_CHECK, aider auto-runs the check and iterates.
- Fallback `claude`: legacy ccr path, auto-used for non-git projects and when aider is
  not installed (never hard-fails).
- Outer contract preserved: same git-snapshot diff attribution, events.jsonl schema
  (added `engine` field), delegation.log feed, QC_CHECK ground truth. `.aider*` scratch
  files filtered out of attribution.
- Validated: aider 11s 12/12, legacy 59s 12/12, telemetry intact.

Persistent ccr (router.ts + ccr-keeper.sh):
- launchd agent com.quadclaude.ccr health-checks :3456 every 60s and restarts ccr if
  down (self-heal ~300ms). Keeper self-locates ccr (PATH / homebrew symlink / newest nvm
  node bin) so node-version churn won't break it. Installed idempotently by
  RouterManager.installCcrKeeper() (macOS-only).

Fail-fast preflight (qcdelegate):
- The real reliability bug was VPN off -> Olares 303-redirects to its auth gateway ->
  HTML -> "Unexpected token '<'" after ~3min of churn. Preflight probes the endpoint for
  a JSON 200 and aborts in seconds with an actionable "VPN likely OFF" message.

qcdoctor: reports delegation engine + aider version and ccr keeper load state.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds a standalone live-feed mode, decoupled from the 1:1 orchestrator/worker pairing so
you can open the feed in as many idle panes as you want.

- New transient PaneConfig.liveFeed flag (cleared on load — the tail process doesn't
  survive an app restart) + setPaneLiveFeed store action.
- PaneHeader shows a one-click "📡 Live feed" button on any pane in the idle `shell`
  state (hidden while a Claude session runs, so we never type into it). Clicking tails
  ~/.quadclaude/delegation.log; the button becomes an active badge with × to stop
  (sends Ctrl-C and clears the flag).
- Several feed panes are allowed (each is independent). With today's single global log
  they show identical content (useful for layout/visibility); per-orchestrator scoping
  is a future follow-up.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ude session

Makes "more than one live feed" genuinely useful: with several Claude sessions
delegating in parallel, each feed pane can track only its own session.

- qcdelegate.sh / qcdecide.sh now mirror every decision + delegation into a per-
  orchestrator file ~/.quadclaude/feed/<QC_PANE>.log (in addition to the global
  delegation.log). QC_PANE is the originating pane id the app already injects. Empty
  QC_PANE (run outside QuadClaude) -> global feed only.
- PaneConfig.liveFeedScope (transient) records which orchestrator a feed follows;
  setPaneLiveFeed(id, on, scope) carries it.
- New LiveFeedButton: one click opens the global feed when nothing else is running
  Claude; when other panes have live Claude sessions, a small menu scopes the feed to
  one of them (or "All"). The active badge shows the scope (e.g. "Live feed · Terminal 3").

Validated: qcdelegate with QC_PANE=777 wrote the full run to feed/777.log; qcdecide
likewise; renderer builds clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-feed panes + per-orchestrator scoping)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…just the number

The 'Follow which session?' menu and the active badge now lead with the pane's project
name (e.g. 'diablo'), with 'Terminal N' as a muted secondary tag — far easier to identify
which session a feed is following.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ent memory

A self-improving eval layer that learns which work qwen handles reliably, stored so it
outlives app updates (and QuadClaude itself).

- qceval (new): the evaluator. `record` appends a labeled outcome per delegated unit;
  `suggest "<unit>"` returns the LEARNED keep/delegate prior for that task class + the
  failure modes to watch for; `verdict <task> ship|revert` records the real human outcome
  for calibration.
- qclearn (new): distills ~/.quadclaude/eval/outcomes.jsonl → rubric.md (per-class priors
  + mined failure modes, hand-editable) + calibration.json (how often the eval was right).
- qcdelegate now auto-records a labeled outcome after every run — passive learning from
  every project, no manual discipline.
- Durability by construction: memory lives in ~/.quadclaude/eval (in $HOME, never in the
  app bundle, never in the dashboard's clearAll), append-only JSONL + plain Markdown,
  git-initialized for portability. RouterManager installs qceval/qclearn; qcdoctor reports
  the memory. The app only visualizes — it never owns the brain.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eval-memory line

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…durable eval memory)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…diff

Turns a green QC_CHECK into a trustworthy verdict by running independent skeptics over the
worker's diff — the missing piece that catches "passed the test but subtly wrong".

- qceval gains a `judge` subcommand: runs 3 independent lenses (correctness / completeness /
  edge-cases) over the working-tree diff, each prompted to REFUTE and biased to flag when
  unsure. Injects the LEARNED failure modes (from rubric.md) into the prompts. Aggregates to
  SHIP / CAUTION / REVIEW; mirrors the verdict into the live feed; folds the votes into the
  matching outcome (enriching the eval memory + future calibration). Judges through the same
  free qwen endpoint by default; QC_JUDGE_MODEL points it at a stronger judge.
- qcdelegate auto-runs the panel when there's NO runnable QC_CHECK (then it's the only
  verification) or on demand via QC_JUDGE=1; skippable with QC_NO_JUDGE=1.

Validated: caught a >= vs > boundary bug (3/3 high severity) and an empty-string edge case
(1/3) while passing correct code — discriminating, not rubber-stamping.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Switching a pane to an env-carrying agent (Qwen/aider) respawns the PTY, which starts at
the default 80x24. launchAgent/restartShell never re-fit, so the next agent (e.g. Claude
Code) rendered into the stale small size, leaving the pane half-empty. Added refitPane()
to re-measure the container and resize the new PTY (also delivers SIGWINCH) right after
every respawn, so agents always fill the available space.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The dashboard was capped at max-w-6xl (1152px), leaving huge empty margins on wide
screens with content crammed in a narrow column. Now:
- modal scales with the viewport (w-[94vw], capped 1800px) instead of a fixed cap
- Decisions ledger flows into 2 columns at lg and 3 at 2xl (was single column)
- Projects grid goes up to 3 (xl) / 4 (2xl) columns (was max 2)

Uses the horizontal space and cuts vertical scroll on large displays.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ouse-tracking mode

A TUI that crashes or is force-stopped can leave mouse reporting (ESC[?1003h/?1006h) or the
alt-screen enabled. The shell then echoes raw mouse sequences (^[[<35;…M) on every cursor
move and the pane looks frozen/uncancellable — Ctrl-C just redraws under the flood. The old
recovery path used terminal.clear(), which does NOT undo those modes.

Added resetTerminal() (xterm terminal.reset() = clears buffer AND resets modes) and use it in
restartShell (Stop) and the agent re-spawn path, so both guarantee a clean terminal. Stop is
now a real recovery for a mouse-mode-stuck pane.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…gation dashboard)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… fills the screen

Redesigned for big screens and to stop the whole view reflowing when you filter a project.
- Modal now fills the height (h-92vh) instead of floating short in the middle.
- Body is a fixed layout: KPIs on top, then a 2-column main area (lg+): Decisions on the
  left, Projects + Calls stacked on the right. Each panel scrolls INDEPENDENTLY, so
  filtering a project (or browsing decisions) never moves the rest of the interface —
  overflow is handled by per-panel scroll, not page reflow.
- Decisions are now click-to-expand (full reason, project, pane, check, timestamp) — same
  affordance the Calls rows already had.
- Projects panel is height-capped and scrolls if there are many.

Bump to 1.23.3.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Highest-leverage logging: mark each delegation's real outcome (Shipped / Reverted / Edited)
so the eval loop learns how often the check/judge was actually RIGHT — the one input the
calibration (and a future GEPA pass) is starved of.

- Expand any Call → "Your outcome" buttons record ship/revert/edit via a new IPC handler
  that runs `qceval verdict <task> <verdict>` (through a login shell so node/~.local/bin
  resolve; task/verdict passed as positional args to avoid injection).
- delegationLog.getEvents now joins the recorded verdict from the durable eval memory
  (~/.quadclaude/eval/outcomes.jsonl) onto each event, so the row shows a verdict badge and
  the dashboard reflects what you've judged.
- Low-volume by design (one per delegation, not per edit) — addresses the log-growth concern.
- Verified end-to-end: recording a verdict updates calibration.json (humanLabeled/trustworthiness).

Bump to 1.23.4.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ng and returning to the app

Switching to another app/window and back could leave the active pane's xterm without
keyboard focus: the pane looked active, text was selectable, but typing and Ctrl-C did
nothing, with no errors. Two-part fix:
- main: on window 'focus', call webContents.focus() so the renderer actually holds focus.
- renderer: on window 'focus' and visibilitychange, re-focus the active terminal's textarea
  (skipped while a modal is open, so it doesn't steal focus).

Bump to 1.23.5.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…l intelligence)

The dashboard showed raw event volume but none of the intelligence the eval memory produces
— the layer that actually answers "what should I delegate?". Reframed around effectiveness:

- KPIs now answer optimization questions: Delegations · Worked% · Delegate-rate · Eval-trust ·
  Issues · Avg-time. Cut vanity metrics (Lines/Files/Cold) to a single muted context line.
- NEW "What to delegate" band — per-task-class pass-rate + a Delegate/Keep recommendation,
  distilled from the durable eval memory (the optimization centerpiece that was missing).
- Eval-trust KPI surfaces calibration honestly ("mark outcomes to start" when unjudged).
- New delegationLog.getInsights() + IPC + types compute by-class stats, success/first-try,
  and calibration from ~/.quadclaude/eval.

Data-accuracy fix: qceval now derives `iterations` from prior attempts for the same task
(was hardcoded to 1), so first-try rate is real.

Bump to 1.24.0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A count you can't act on is decoration. Clicking 'Issues: N' now clears any project
filter, switches the Calls panel to the Issues view, and auto-expands the first failed
call so you see the prompt/output/check immediately. Kpi gained an optional onClick
(button with hover ring) for this. Bump to 1.24.1.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reviewing an Issue wasn't self-explanatory. Now, when a check failed (or the worker errored),
the expanded Call shows a contextual hint connecting the failure to the action: 'Nothing to
fix here — record what you did with it below.' Added a ⓘ explainer tooltip on 'Your outcome'
(what ship/revert/edit mean + why it trains Eval-trust) and hover tooltips on the ok/check/cold
badges so each is self-documenting. Bump to 1.24.2.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…o the worker

Decisions (intent, via qcdecide) and Calls (execution + prompt, via qcdelegate) share no
key. Expanding a DELEGATE decision now joins to its Call by project+pane+time and shows the
prompt that actually went to qwen (+ route/exit/check/diff), so you don't have to hunt for
it in the Calls panel. Bump to 1.24.3.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
rdyplayerB and others added 14 commits June 16, 2026 07:51
We were only storing a 1000-char preview — not the full prompt. Now qcdelegate persists the
FULL prompt to a lazy per-call store (~/.quadclaude/prompts/<ts_task>.txt), so events.jsonl
stays small and the full text loads only when you ask for it.

- "View full ↗" on any Call's prompt (and on the prompt linked into a DELEGATE decision)
  opens a popup with the complete untruncated prompt + a Copy button — without taking over
  the dashboard. New delegationLog.getFullPrompt() + IPC; key derived from event ts+task,
  round-trip verified end-to-end.
- Every KPI now has a ⓘ hover tooltip explaining exactly what it measures (incl. that a
  "unit" is a logged decision, not a prompt; what Worked/Eval-trust/Delegate-rate mean).

Lazy by design — no event bloat. Full prompts available for delegations going forward.
Bump to 1.24.4.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Projects box only counted Calls (delegations), so it badly under-represented work:
quip-assets showed "1 call" despite 8 keep/delegate decisions, and projects with decisions
but no delegation (expo-signage, Blogogi, …) didn't appear at all. And selecting a project
filtered Calls but never Decisions.

Reworked per the right mental model (Decisions = intent, Calls = execution):
- Projects is now a LEFT vertical rail built from a UNION of decisions + calls, so every
  project with activity appears with its REAL counts (N decisions · M delegated · K calls ·
  check%). Decision-only projects included.
- Selecting a project filters the DECISIONS panel (now on the right) AND the Calls panel.
  All decisions by default; click a project to scope to it; "show all" clears.
- Right column: Decisions (top, primary) + Calls (below).

No data was mis-associated — decision project paths were correct; the box just read the
wrong source. Bump to 1.25.0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Audit of accumulated dead code from this session's feature churn. Each removal verified to
have zero references (compiler + grep) before deletion; build stays green.

- GitStatusBar (TerminalPane, 67 lines): component defined, never rendered anywhere.
- hasTerminal (TerminalPane): leftover from the removed auto-pairing worker-feed; unused.
- closeRowGap() call (TerminalPane): dangling call to a deleted function — it threw
  ReferenceError every resize, silently swallowed by the surrounding try/catch. Removing it
  also clears a real TS2304 "cannot find name" error.
- useDelegation hook + DelegationApprovalModal + PendingApproval (App.tsx + hooks file):
  the whole first-delegation-approval feature. useDelegation() was retired to return
  pending:null, so the modal could never render. Deleted the no-op hook file and all wiring.
- getPerfLogPath (perfMonitor): exported, zero readers.

Verified: no other file imports any of these; main-process "orphans" flagged by the scan
(ROUTER_COMMAND, DELEGATE_COMMAND, perfMonitor helpers, etc.) are actually used same-file and
were KEPT. No functional change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… larger text

Decision/call rows were clipped on the right: the Decisions/Calls grid column defaulted to
min-width:auto, so it refused to shrink below its widest line and overflowed past the modal
(rows didn't truncate — they got clipped). Added min-w-0 to the right column AND all three
panels so rows truncate within bounds.

Also: widened the modal (max-w 1800→2600px, 96vw, 93vh) to use the available space on large
screens, and bumped the main reading text up a notch (decision group / call task → text-sm,
reason → text-xs, row timestamps → 11px) for readability. Bump to 1.25.1.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The dashboard/settings modals used glass-elevated (rgba 30,30,30,0.25 — 25% opaque) with CSS
blur disabled, so terminal content bled through and was unreadable. But glass-elevated is ALSO
the terminal panes' background (needs that translucency for the wallpaper), so the global var
can't change. Added a separate near-opaque --glass-bg-modal (0.97) + .glass-modal class and
switched both modals to it; darkened the dashboard overlay (black/50 → /75). Panes unchanged.
Bump to 1.25.2.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… dashboard)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Users read the Logic/Docs/UI/Data cards as interactive/configurable, but they're a read-only
learned signal. Added a ⓘ header tooltip (what it is, how the recommendation firms up, that it
auto-learns and isn't configurable, + the rubric.md hand-edit path) and enriched each card's
hover to explain its %, sample, and recommendation. Labeled the band 'learned from your
outcomes'. Bump to 1.25.4.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ely logging

A healthy pipeline + a quiet dashboard is ambiguous (no delegatable work vs. the active
session not running qcdecide). Added a 'last logged Xm ago' heartbeat in the header that goes
amber/⚠ stale when >45min, with a tooltip explaining the cause (working pane's Claude isn't in
delegation mode) and fix (relaunch that session so its SessionStart hook re-detects it). Bump
to 1.25.5.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… crash class)

node-pty can emit one more chunk after the window/webContents is destroyed on quit/reload;
mainWindow?.webContents.send() guards null but NOT a destroyed-but-non-null webContents, which
throws 'TypeError: Object has been destroyed' (the exact crash dialog seen in a sibling app).
Added a sendToRenderer() helper that checks isDestroyed() on both window and webContents, and
routed all five send sites (TERMINAL_OUTPUT, PTY_EXIT, DELEGATION_EVENT, APP_MENU_ACTION,
SYSTEM_RESUME) through it. Bump to 1.25.6.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
qcshadow re-runs a unit Claude chose to KEEP through qwen in an ISOLATED
git worktree (HEAD), runs the same ground-truth check + adversarial judge,
and records whether qwen could have matched — to ~/.quadclaude/eval/shadow.jsonl.
qwen's output is never shipped; this only measures where Claude is over-cautious.

- qcshadow.sh + .b64, installed to ~/.local/bin via router.writeDelegateScript
- delegationLog: parse shadow.jsonl, attach a per-decision verdict, roll up the
  over-caution aggregate (matched/fellShort/byClass) into DelegationInsights
- types: ShadowOutcome, ShadowVerdict; shadow on DelegationDecision/Insights
- qcdoctor: report qcshadow + recorded-run count

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Briefing view: computed headline verdict (over-cautious / earned / untested),
  per-task-class evidence blending eval pass-rate with shadow match, and an
  all-decisions ledger drill-in with inline shadow verdict bands
- Responsive: modal sizes to content for the briefing (no vertical void),
  full height for the scrolling ledger; widened content to use horizontal space
- Zoom: Cmd +/- (and a header -/%/+ control) scale the dashboard while it's open,
  without touching terminal font; persisted across launches
- Plain headline text (removed highlighter/underline); brighter, larger
  decision reasons so the keep/delegate "why" reads as primary content
- Bump to v1.26.5

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Crash: node-pty's read thread fired a ThreadSafeFunction callback into a
half-finalized V8 env on quit -> SIGABRT in pty.node (recurring CrBrowserMain
abort). Guard onData/onExit so a throw can never abort the process, drop the
map entry before kill(), and hard-exit on before-quit so the OS reaps the
native threads instead of V8 racing them.

Links: terminal link clicks and any window.open now route through
shell.openExternal (a normal tab in the default browser) instead of spawning
a chromeless Electron popup window.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The usage poller cached the OAuth token for 10 min, so after switching accounts
it kept reporting the PREVIOUS account's usage. Now it watches ~/.claude.json,
detects an account change, drops the stale token + usage, and refetches — and
tags the cache with its account so a value never shows for the wrong one. Also
mirrors the current account to a tiny file the statusline reads.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bind a terminal pane to a saved Claude account so two panes can run two
different Max subscriptions at once. At spawn, the bound account's long-lived
CLAUDE_CODE_OAUTH_TOKEN (from `claude setup-token`) is injected, overriding the
shared Keychain login; ANTHROPIC_API_KEY is blanked for that pane so a stray
global key can't silently switch it to metered billing. The statusline shows
each pane's real account via an injected label.

- accountStore: tokens encrypted with safeStorage (OS-Keychain-backed), 0600
  file, never in workspace.json, never returned to the renderer (write-only)
- Settings → Accounts tab to add/edit/remove accounts (ClaudeAccountsSettings)
- Unified pane launch menu: Claude accounts appear as launchable Claude
  identities ("Claude Code · boshiro.one"), not a separate parallel list
- Account passed to main as a non-secret env hint (timing-safe vs debounced
  workspace persistence); main decrypts + injects the token
- Bigger, responsive Settings modal so the new tab fits without scrolling
- Bump to v1.27.2

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant