Skip to content

A throwing event subscriber aborts turn processing and hangs the shell (no error isolation in agent-shell--emit-event) #663

@candera

Description

@candera

Summary

If any :on-event subscriber registered via agent-shell-subscribe-to signals an error, it aborts dispatch of the entire event — including agent-shell's own internal subscribers — and the error propagates up into turn/stream processing. The result is a hung shell: the buffer stops updating, the turn never completes, and the heartbeat spinner never stops.

A user-level mistake in one subscriber callback (which should at worst break that one notification) instead takes down the whole session.

Root cause

agent-shell--emit-event funcalls subscribers in a bare dolist with no error isolation:

(cl-defun agent-shell--emit-event (&key event data)
  (let ((state (agent-shell--state))
        (event-alist (list (cons :event event))))
    (when data
      (push (cons :data data) event-alist))
    (dolist (sub (map-elt state :event-subscriptions))
      (when (and (buffer-live-p (map-elt state :buffer))
                 (or (not (map-elt sub :event))
                     (eq (map-elt sub :event) event)))
        (with-current-buffer (map-elt state :buffer)
          (funcall (map-elt sub :on-event) event-alist))))))   ; <-- no condition-case

agent-shell--emit-event is called from within the response/turn handlers (e.g. around agent-shell.el:1677, 1867, 1954, 2073, 2228). A throw from any one subscriber unwinds out of those, so the in-progress turn is abandoned mid-flight.

Reproduction

  1. Subscribe a callback that errors:
    (add-hook 'agent-shell-mode-hook
              (lambda ()
                (agent-shell-subscribe-to
                 :shell-buffer (current-buffer)
                 :event 'turn-complete
                 :on-event (lambda (_event) (error "boom")))))
  2. Start an agent-shell session and send a prompt.
  3. Observe: the turn hangs, the buffer stops updating, the heartbeat spinner keeps running, and *Messages* shows the error. The backend process is idle (its work is done) — only the Emacs-side turn handling is broken.

In my case the erroring callback was an accidental void-variable (a closure that failed to capture a variable because the init file was dynamically bound). It was hard to diagnose because the symptom — a silent, long-lived hang across all open shells — looked nothing like "one subscriber is throwing." The shells share Emacs's main thread, so a single broken session degrades the others too.

Suggested fix

Isolate each subscriber so one bad callback can't abort dispatch or turn processing:

(dolist (sub (map-elt state :event-subscriptions))
  (when (and (buffer-live-p (map-elt state :buffer))
             (or (not (map-elt sub :event))
                 (eq (map-elt sub :event) event)))
    (with-current-buffer (map-elt state :buffer)
      (condition-case err
          (funcall (map-elt sub :on-event) event-alist)
        (error
         (message "agent-shell: subscriber for %s errored: %S" event err))))))

(If internal vs. user subscribers should be treated differently, internal ones could still be allowed to propagate — but at minimum, user subscribers registered via the public agent-shell-subscribe-to API shouldn't be able to wedge a turn.)

Related issues

Checklist

  • I agree to communicate with the author myself (not AI-generated).
  • I've read the README's Filing issues section.
  • I'm running the latest versions (fill in below).
    • agent-shell version: 20260608.9
    • acp.el version: 0.12.2
    • ACP package (e.g. claude-agent-acp) version: claude-agent-acp 0.44.0
    • Agent CLI (e.g. claude, gemini) version: Claude Code 2.1.181

Environment

  • Emacs 30.2 (macOS)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions