Summary
If any :on-event subscriber registered via agent-shell-subscribe-to signals an error, it aborts dispatch of the entire event — including agent-shell's own internal subscribers — and the error propagates up into turn/stream processing. The result is a hung shell: the buffer stops updating, the turn never completes, and the heartbeat spinner never stops.
A user-level mistake in one subscriber callback (which should at worst break that one notification) instead takes down the whole session.
Root cause
agent-shell--emit-event funcalls subscribers in a bare dolist with no error isolation:
(cl-defun agent-shell--emit-event (&key event data)
(let ((state (agent-shell--state))
(event-alist (list (cons :event event))))
(when data
(push (cons :data data) event-alist))
(dolist (sub (map-elt state :event-subscriptions))
(when (and (buffer-live-p (map-elt state :buffer))
(or (not (map-elt sub :event))
(eq (map-elt sub :event) event)))
(with-current-buffer (map-elt state :buffer)
(funcall (map-elt sub :on-event) event-alist)))))) ; <-- no condition-case
agent-shell--emit-event is called from within the response/turn handlers (e.g. around agent-shell.el:1677, 1867, 1954, 2073, 2228). A throw from any one subscriber unwinds out of those, so the in-progress turn is abandoned mid-flight.
Reproduction
- Subscribe a callback that errors:
(add-hook 'agent-shell-mode-hook
(lambda ()
(agent-shell-subscribe-to
:shell-buffer (current-buffer)
:event 'turn-complete
:on-event (lambda (_event) (error "boom")))))
- Start an agent-shell session and send a prompt.
- Observe: the turn hangs, the buffer stops updating, the heartbeat spinner keeps running, and
*Messages* shows the error. The backend process is idle (its work is done) — only the Emacs-side turn handling is broken.
In my case the erroring callback was an accidental void-variable (a closure that failed to capture a variable because the init file was dynamically bound). It was hard to diagnose because the symptom — a silent, long-lived hang across all open shells — looked nothing like "one subscriber is throwing." The shells share Emacs's main thread, so a single broken session degrades the others too.
Suggested fix
Isolate each subscriber so one bad callback can't abort dispatch or turn processing:
(dolist (sub (map-elt state :event-subscriptions))
(when (and (buffer-live-p (map-elt state :buffer))
(or (not (map-elt sub :event))
(eq (map-elt sub :event) event)))
(with-current-buffer (map-elt state :buffer)
(condition-case err
(funcall (map-elt sub :on-event) event-alist)
(error
(message "agent-shell: subscriber for %s errored: %S" event err))))))
(If internal vs. user subscribers should be treated differently, internal ones could still be allowed to propagate — but at minimum, user subscribers registered via the public agent-shell-subscribe-to API shouldn't be able to wedge a turn.)
Related issues
Checklist
Environment
Summary
If any
:on-eventsubscriber registered viaagent-shell-subscribe-tosignals an error, it aborts dispatch of the entire event — including agent-shell's own internal subscribers — and the error propagates up into turn/stream processing. The result is a hung shell: the buffer stops updating, the turn never completes, and the heartbeat spinner never stops.A user-level mistake in one subscriber callback (which should at worst break that one notification) instead takes down the whole session.
Root cause
agent-shell--emit-eventfuncalls subscribers in a baredolistwith no error isolation:agent-shell--emit-eventis called from within the response/turn handlers (e.g. around agent-shell.el:1677, 1867, 1954, 2073, 2228). A throw from any one subscriber unwinds out of those, so the in-progress turn is abandoned mid-flight.Reproduction
*Messages*shows the error. The backend process is idle (its work is done) — only the Emacs-side turn handling is broken.In my case the erroring callback was an accidental
void-variable(a closure that failed to capture a variable because the init file was dynamically bound). It was hard to diagnose because the symptom — a silent, long-lived hang across all open shells — looked nothing like "one subscriber is throwing." The shells share Emacs's main thread, so a single broken session degrades the others too.Suggested fix
Isolate each subscriber so one bad callback can't abort dispatch or turn processing:
(If internal vs. user subscribers should be treated differently, internal ones could still be allowed to propagate — but at minimum, user subscribers registered via the public
agent-shell-subscribe-toAPI shouldn't be able to wedge a turn.)Related issues
shell-maker--busystuck att, no prompt inserted, buffer read-only, session unrecoverable — with a different suspected cause. A subscriber throwing duringturn-completedispatch would skip the busy-reset/prompt-insert and produce exactly that state, so this fix may also resolve Buffer becomes read-only after response: shell-maker--busy stuck at t with no active request #511.Checklist
Environment